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Abstract 


This thesis investigates two aspects of the simulation of quantum systems 
with classical computers, i. e., ordinary computers, as opposed to quantum 
computers. 

Part I treats the simulation of quantum computers and other quantum- 
information tasks. While it is most likely not possible to efficiently simulate 
the operation of a quantum computer on a classical one certain aspects of its 
operation can be simulated, especially so-called Clifford operations, as a the¬ 
orem by Gottesman and Knill asserts. The thesis reviews the theory leading 
to this result, the stabilizer formalism, and its connection to the concept of 
the so-called graph states. This is then used to develop a way of perform¬ 
ing such a simulation by means different and more efficient than those pre¬ 
viously employed, namely by using the graph state formalism. This new 
simulator is applied to investigate the properties and performance of various 
entanglement-purification protocols. Furthermore, a strategy to use the sim¬ 
ulator in the study of the conditions for fault-tolerant operation of quantum 
computers is presented. 

Part II deals with variational methods to find and study ground states 
of spin systems. After reviewing the existing techniques, the thesis explores 
in depth two new ansatzes: The weighted graph states are derived from a 
generalization of the graph state concept employed in Part I and seem to 
be a promising variational class of states due to their specific entanglement 
properties. Their viability for the purpose of approximation of ground states 
is studied for different settings. Finally, another method, based on so-called 
tensor-tree states, is introduced in the context of its related concepts (matrix- 
product states and projected entangled-pair states), and algorithms for their 
use are developed. 
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Abstract: Deutsche Ubersetzung 


Diese Arbeit untersucht zwei Aspekte der Simulation von Quantensyste- 
men auf klassischen Computern, d. h v auf gewohnlichen, nicht auf Quanten- 
Computern. 

Teil I befasst sich mit der Simulation von Quanten-Computern und 
anderen Aufgaben der Quanten-Informationsverarbeitung. Wahrend es 
hochstwahrscheinlich unmoglich ist, den Betrieb eines Quantencompu- 
ters effizient auf einem klassischen Computer zu simulieren, konnen 
gewisse Teilaspekte sehr wohl simuliert werden, insbesondere — wie das 
Gottesmann-Knill-Theorem aussagt — sogenarmte Clifford-Operationen. 
Die vorliegende Arbeit stellt die zu dieserm Ergebnis fiihrende Theorie 
vor, sowie deren Verbindung zu dem verwandten Konzept der Graphen¬ 
zustande. Dies wird dann genutzt, um einen neuen Weg zu entwickeln, 
derartige Simulationen durchzufiihren, wobei durch die Verwendung des 
Kalkiils der Graphenzustande eine bessere Performance als bei den bisheri- 
gen Verfahren erreicht wird. Dieser neue Simulator wird dann verwendet, 
um die Eigenschaften und die Leistungsfahigkeit verschiedener Protokolle 
zur Verschrankungsreinigung zu untersuchen. Dariiberhinaus wird eine 
neue Methode vorgesstellt, um den Simulator zu verwenden, um die Be- 
dingungen fiir den fehler-toleranten Betrieb eines Quantencomputers zu 
untersuchen. 

Teil II untersucht Variations-Methoden zur Untersuchung von Grund- 
zustanden von Spin-Systemen. Die Arbeit gibt zunachst einen Uberblick 
iiber bestehende Techniken und verfolgt dann zwei neue Ansatze: Die ge- 
wichteten Graphenzustande sind eine Verallgemeienrung der in Teil I behan- 
delten Graphenzustande und erscheinen aufgrund ihrer Verschrankungs- 
Eigenschaften als eine vielversprechende Klasse von Variations-Zustanden. 
Ihre tatsachliche Eignung zur naherungsweisen Darstellung von Grund- 
zustanden wird fiir verschiedene Systeme getestet. Schliefilich wird noch 
eine weitere Methode, die auf sog. Tensorbaum-Zustanden basiert, im 
Zusammenhang verwandter Konzepte (Matrix-Produkt-Zustande und 
Projected-Entangled-Pair-Zustande) eingefiihrt und Algorithmen fiir ihre 
Nutzung werden entwickelt. 
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Chapter 1 

Introduction 


Only very few problems in physics can be solved exactly or even in closed 
form. In the vast majority of cases, more or less sophisticated numerical tech¬ 
niques are needed. As the computational effort may be large, the number 
and kinds of problems accessible to computational physics increases steadily 
with the ever increasing power of computers. Many tasks scale in a char¬ 
acteristic way with the problem size. The canonical example is a system of 
N interacting, point-like, classical particles: this system has a phase space of 
dimension 6 N, and tracking the time evolution caused by the Hamiltonian, 
i. e., following a trajectory in phase space, means working with an amount of 
data linear in the system size. The computational effort to describe generic 
classical systems typically scales as a low polynomial in N. Thus, simulations 
of many-body systems are now possible for quite large particle numbers N. 
Nevertheless, classical many-body physics is far from being fully explored. 

In quantum mechanical problems, computations are much more difficult 
due to the fact that the dimension of the Hilbert space of a quantum system 
grows exponentially with system size as opposes to the linear scaling in the 
classical case. Systems can thus become intractable even for very few degrees 
of freedom. 

However, for reasons related to the notion of complementarity in quan¬ 
tum mechanics, any amount of information which can be extracted from a 
physical quantum system by means of a specific measurement is polynomial 
even if one needed an exponential amount of information to describe the 
quantum state. Hence, even though we are only interested in a polynomial 
amount of data, we seem to have to go through an exponentially complex, 
not accessible wave function. This causes, in the general case, insurmount¬ 
able problems for any conventional computing device, but is a feat that is 
constantly achieved by nature — if one thinks of the working of the laws 
of nature as a vast computation. This line of reasoning brought Feynman 
to his famous idea that, although a full treatment of a general quantum sys¬ 
tem with any "normal" computer may be impossible, one quantum system 
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could simulate another quantum system in a fruitful manner [Fey82]. Hence, 
to advance our understanding of all varieties of quantum systems, a comput¬ 
ing device that "works" in the full volume of its quantum-mechanical Hilbert 
space and not just in its classical states [Fey85], would be a tool of tremendous 
value. While the realization of a quantum computer is still very far away, 
considerable progress has been made since the first mentioning of this idea 
by Feynman: First, Deutsch [Deu85] noticed a quantum computer may vio¬ 
late the quantitative Church-Turing principle. This principle asserts that all 
physically realizable models of universal computers can simulate each other 
with at most polynomial overhead in time. In other words, a problem that 
can be solved within one computational model in time polynomial in the 
size of the input data can also be simulated in polynomial time in all other 
"reasonable" models of universal computers — possibly, with the polyno¬ 
mial having another degree. This "transitivity" of the notion of "solvable in 
polynomial time" motivates the introduction of the term efficiently solvable for 
it. In the thesis the term "efficient" shall always mean "in polynomial time", 
following the parlance of computer scientists. 

Deutsch could only give an example in the so-called query setting, i. e. 
the computer is required to solve a problem whose answer the experimenter 
knows. Hence, Shor's discovery [Sho94] of an explicit example of a math¬ 
ematical task that a quantum computer might perform exponentially faster 
than a classical computer (namely the factorization of large integers and the 
discrete logarithm) vastly increased interest in the new field of quantum in¬ 
formation science. To be precise, there still is no proof of the impossibility 
of an efficient classical algorithm for these problems, and thus the claim that 
quantum computers break the Church-Turing principle is, strictly speaking, 
still only a conjecture, albeit one very widely believed to be true. This conjec¬ 
ture is formally written as BPP C BQP, where BQP is the complexity class 1 
of all problems that a quantum computer can solve efficiently, and BPP is the 
class of all problems that a classical computer 2 can solve efficiently. 

A proof of this would imply that P ^ PSPACE [BV93], which is an open 

? 

problem of complexity theory on par with the infamous NP = P problem. 
Hence we use the (common) assumption that P f NP in the following. Fur¬ 
thermore, one typically assumes that quantum computers cannot solve NP 
(the class of all problems for which solution can be verified (but not necessar- 


1 For a thorough introduction to (classical) complexity theory, see [Pap94]. Aaronson's 
"Complexity Zoo" [AK] is a useful resource aiming at providing definitions and references 
for all complexity classes (classical and quantum) discussed in the literature. 

2 The term "classical computer" is defined more precisely as a computer that can efficiently 
simulate a Turing machine and is efficiently simulatable by a Turing machine, where a Turing 
machine is an abstract concept suggested by Turing to capture the notion of such a universal 
computer. For all "reasonable" computational models, which seem physically realizable and 
universal (Turing-complete), such equivalence to a Turing machine has been formally proven 
- with the important exception of the quantum computer. 
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1. Introduction 


ily found) in polynomial time by a classical computer). Strong evidence for 
this assumption is given in [BBBV97], 

The first important task in designing a quantum computer is to identify 
quantum systems with a suitable discrete state space. Each element of this 
system is usually meant to represent one bit of extractable information and 
is called a qubit. The two classical bit values 0 and 1 are represented by two 
quantum levels of the qubit's relevant degree of freedom. An assembly of 
such N qubits, called a quantum register, can hence be described using a 
product basis formed by the qubit levels, where each basis state corresponds 
to one of the integers in 0... 2 N — 1. The availability of arbitrary complicated 
superpositions of these basis states, i. e. of highly entangled states, is what 
sets a quantum computer apart from a classical one. This importance of en¬ 
tanglement is also the reason why the study of entanglement, especially of its 
quantification, advanced in parallel with the study of quantum computation, 
and is considered part of quantum information theory. 

Let us hence digress to very briefly review some basic notions on en¬ 
tanglement. When combining two spatially separated systems with Hilbert 
spaces Ha, Hb, the Hilbert space of the composite system is the tensor prod¬ 
uct H = Tii <8> 'Hi- Any state \ip) £ H may be written in the form 

l</’> = E A ; I <pf) ® \<pj)> A ; e c\{0}, I cpf) £ Ha, \<pf) G H B , 

M 

and is called separable for y = 1 and entangled for y >1. The number of terms, 
y, is called the Schmidt rank of | ip) with respect to the bipartition of the system 
into the parts A and B. In an entangled state, a measurement of one system 
can influence the state of the other system, a paradoxical fact that Schrodinger 
illustrated with his famous example of the dead-or-alive cat [Sch35]. This in¬ 
fluence seems to act instantaneously as a "spooky action over distance", as 
Einstein called it. He argued, together with Podolsky and Rosen in the fa¬ 
mous "EPR paper" [EPR35], that this paradox shows the incompleteness of 
quantum mechanics: Additional inaccessible degrees of freedom, called "hid¬ 
den variables", are required to explain the correlations due to entanglement 
without giving up the idea of reality being "local". However, in a seminal 
work [Bel64], Bell analysed Bohm's formulation [Boh51, BA57] of the EPR 
setup and showed that such an amendment is impossible, because the cor¬ 
relations predicted by quantum mechanics can exceed any correlations that 
may arise from local hidden variables. This result immediately posed the 
question whether these counter-intuitive predictions of quantum mechan¬ 
ics are actually correct, or whether they point to a flaw in the theory. The 
question was settled in favour of quantum mechanics by a number of exper¬ 
imental realization of the EPR-Bohm experiment that tested Bell's inequality 
(typically in the revised form of [CHSH69]), most famously the one by Aspect 
and coworkers [AGR81]. The main focus of work on entanglement theory is 


0 


0 


0 


0 










0 


0 


0 


0 


13 


now to formalize, classify and quantify the different types of entanglement 
and study its potential applications. 3 For a recent and comprehensive review 
of the results found chiefly in the last one or two decades, see [HHHH07]. 

The difficulty to actually build a quantum computer stems from two con¬ 
flicting requirements: On the one hand, it must be possible to interact with 
the qubits in order to manipulate the stored quantum information to carry 
out the computation. On the other hand, the system must be extremely well 
decoupled from the environment, as such coupling would cause decoher¬ 
ence, i. e., drive the system towards its classical states rather than allow it to 
stay in highly entangled quantum states. (See, e. g., [Zur03] for a treatment 
on decoherence theory). The challenges that these requirements pose for the 
designer of a quantum computer are analyzed in the classic work [DiVOO]. 
That such a quantum computer is also useful to simulate quantum systems 
—the original motivation— has been shown by Lloyd [Llo96]. 

Many different systems have been examined for their potential as qubits 
for quantum computer; the most promising suggestions include ions in a 
linear trap [CZ95], atoms in optical lattices [BCJD99], quantum dots in a 
semiconductor [LD98], flux quanta in superconducting structures [MOL ’ 99, 
MSS99], nuclear spins in solids interacting with microwaves [Kan98], and 
photons in linear-optics setups [KLM01]. 

Identifying a good system to use as quantum register and learning to con¬ 
trol it is one task on the road to a quantum computer. Another one is to 
learn how to operate it — most importantly in such a way that errors due 
to unavoidable noise do not accumulate. To study such questions, it is ad¬ 
vantageous to disregard the physical nature of the qubits and consider them 
as parts of an abstract composite quantum system. A qubit is then an entity 
described by a Hilbert space C 2 that can be entangled with other qubits to 
form the N-qubit Hilbert space 'H = (C 2 ) N . The interactions imposed by 
the operator to carry out the computation are seen as a sequence of unitary 
operations acting on T~L. Each of these operations is taken from a small set 
of elementary gates, which is chosen such that any unitary can be formed by 
them. (In fact, such a universal gate set can be as small as consisting of just 
three specific gates, each having support on only one or two qubits. A pro¬ 
posal for a physical realization of a quantum computer hence "only" needs 
to show that the gates of one of the universal gate sets can be implemented 
in a way that is controlled sufficiently precisely.) Measurements to read out 
data are projection operations, and noise, ie., uncontrolled interaction with 
the environment, can be described by superoperators. 

It is this abstraction in which quantum information theory (as treated in 
textbooks such as [Pre98b, Gru99, NCOO, BEZOO, BL07]) usually resides, and 
a multitude of questions may be studied in it. 

A much discussed question is whence a quantum computer gets its 

3 For example, the Schmidt rank just mentioned is a proper entanglement measure [EB01]. 
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1. Introduction 


power. Where precisely is the boundary between calculations that may be 
performed efficiently on a classical computer and those that require the 
"quantumness" of a quantum computer? To map out this boundary one 
may ask which operations of a quantum computer —idealized in the sense 
just explained— can still be efficiently simulated on a classical device and 
which require a quantum computer (assuming, as stated above, that there 
is a difference between BQP and BPP.). Two of the most important results 
on this question, namely the Gottesman-Knill theorem [Got98a] and Vidal's 
"slightly entangled quantum computations" [Vid03] are important starting 
points for this thesis. 4 

The thesis is divided into two parts. Part I deals with simulations based 
on the Gottesman-Knill theorem. This theorem asserts that a quantum com¬ 
putation can be simulated efficiently if the quantum operations performed 
are restricted to a certain discrete subgroup of the group of unitaries, known 
as the Clifford group. Chapter 2 reviews the formalisms of stabilizer states 
and graph states and presents results on the structure of the local Clifford 
group. In Chapter 3, which is a reprint of our publication [AB06], these re¬ 
sults are used to derive an algorithm to perform simulations of the kind cov¬ 
ered by the Gottesman-Knill theorem in a very efficient manner. In Chapter 4, 
a proposal is made how this simulation technique may also be used to study 
the fault tolerance of quantum computers. Our implementation of the simu¬ 
lation algorithm is used in Chapter 5, a reprint of [KADB06], to simulate and 
compare entanglement purification protocols. 

The question about the boundary between classical and quantum compu¬ 
tations also led to interesting results on the simulation of quantum systems 
with classical computers. First, it was soon realized that strong entanglement 
between the qubits of the quantum register can cause the device to leave the 
realm of what can be classically simulated efficiently, i. e., in polynomial time. 
A notion such as "strong entanglement" naturally requires ways to quantify 
entanglement, and this has been a major topic of quantum information the¬ 
ory. (For a recent review, see [AFOV07].) Vidal showed that the evolution of a 
quantum register that stays only slightly entangled (in a certain well-defined 
sense) can be simulated efficiently on a classical computer [Vid03]. He then 
realized that this gives rise to a technique to simulate quantum spin chains 
[Vid04] that is remarkably similar to the so-called density-matrix renormal¬ 
ization group algorithm [Whi92, Whi93]. This somewhat unexpected meet¬ 
ing of quantum information science and solid-state physics, more explicitly 
explored in [VPC04], may have marked the beginning of a fruitful train of 

4 By now, the boundary of what can be simulated classically has been pushed out a bit fur¬ 
ther: For example, evolution constrained to p-blocked states (products of pure states each of 
which show entanglement across at most p qubits) can be tracked efficiently [JL03]. These 
results have been used, for example, to study to which extent the quantum Fourier transfor¬ 
mation -the core of Shor's algorithm- can be simulated efficiently, see [Bro07] and references 
therein. 
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research that led to the development of a whole range of new techniques for 
the simulation of quantum systems on lattices. Part II of the thesis is situated 
in this context. In Chapter 6, I attempt to give an overview over these new 
techniques as well as older, established numerical techniques of solid-state 
and statistical physics research. This overview is meant to allow for a better 
understanding of the context of the following chapters. 

We have developed a variational method based on so-called weighted 
graph states, which is presented in Chapters 7 (a reprint of [AI’D 1 06]) and 
8 (accepted for publication, preprint [ABD07]). We are currently working on 
another variational numerical technique, based on a class of state introduced 
as tree tensor networks in [SDV06]. Chapter 9 shows how we have expanded 
the tensor-tree ansatz to a variational technique and combined it with the 
weighted-graph scheme. This is still work-in-progress, and hence only pre¬ 
liminary results are presented. 

All reprints in this thesis reproduce the text in full and without alterations; 
only the references have been merged into a single common bibliography, to 
be found in Appendix B. The thesis has been written with the aim of provid¬ 
ing a self-contained exposition to the subject matter and it should be possible 
to read it from cover to cover despite the mixing of reprints and new text. 
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Chapter 2 

Stabilizer states and Clifford 
operations 

When dealing with finite-dimensional Hilbert spaces, elementary calcula¬ 
tions are usually performed in the so-called computational basis. When de¬ 
scribing a set of N distinguishable d-level systems with Hilbert space 'H = 
(C" ) , this term denotes the basis consisting of the product states 

N 

|s) := |si,...,s N ) := (g) |s fl ), s G S, s G S N , 

a =0 

where S := {0 ,... ,d — 1} denotes the set of levels per d-level systems. In the 
following, we shall call each of these d-level system (in a way synonymously) 
a "site", a "spin", or a "qudit". We avoid the expression "particle", as we will 
(later, in Part II) also consider local or collective excitations of these spins or 
sites as quasi-particles. 

As the dimension of the Hilbert space 7 ~i grows exponentially with the 
number N of sites, dim 7i = d N , a basis representation is of use for symbolical 
and especially numerical calculations only in the case of very small N, unless 
one only considers product states. 

For entangled states, a formalism which allows for a succinct represen¬ 
tation of at least some highly entangled states is most useful. One such 
formalism is the stabilizer formalism, a calculus that allows to deal with pre¬ 
cisely those entangled states which are of importance for certain subfields of 
quantum information science, namely entanglement purification and quan¬ 
tum error correction. The following section (Sec. 2.1) shall give a rather brief 
overview over stabilizer states. It does not aim to provide more information 
than strictly needed for the purpose of this thesis; the reader may want to 
learn more about them from the respective chapter in Nielsen and Chuang's 
textbook [NC00], or from Gottesman's PhD thesis [Got97], which is not only 
the first longer publication to present this topic in depth but also still one of 
the most authoritative ones. 
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2. Stabilizer states and Clifford operations 


A subclass of the stabilizer states are the so-called graph states. As they 
are introduced twice within the reprinted publications (in Sec. 3.2, and at 
length in Sec. 5.2.1), we stay rather brief on them in the present chapter and 
mention them in Sec. 2.2 in order to motivate the study of the local Clifford 
group —which is the main topic of Sec. 2.2— via its role in mediating between 
stabilizer and graph states. 


2.1 Stabilizer states and the Gottesman-Knill theorem 


The stabilizer formalism was developed in the context of research on quan¬ 
tum error correction, systematically exposed in [Got97] and is also treated at 
length in the textbook [NCOO]. It is usually employed for treating qubit sys¬ 
tems, i. e. d = 2, and we shall restrict ourself to this case, too, when we review 
in the remainder of this chapter those parts of it that are of relevance for the 
present work. 

We start with a few simple definitions: 


Definition 2.1 The Pauli group on N qubits, Vn, is the set of all tensor products 
of the form 

N 

£® cry, l G {±1, ± 0 , <Tj G V := {I, X, Y, Z} 

;=1 

where the elements ofV are the identity operator and the Pauli matrices: 


I = 



Y = 





( 2 . 1 ) 


Definition 2.2 A quantum state \ip) G TL = (C 2 ) 0N is said to be stabilized by a 
linear (typically Hermitean) operator U acting on TL if \tp) is an eigenstate ofU to 
the eigenvalue 1, i. e., U \ip) = \ip). 


Obviously, if two linear operations U\ and U 2 both stabilize \ ip), so does 
U 1 U 2 . We may hence conclude that for a set S of unitary operations stabiliz¬ 
ing | tp), all elements of the group S generated by S stabilize \iff. The elements 
of such a group can stabilize different states. If two states \ip\) and | xpf) are 
both stabilized by S, so is any linear combination of them, and hence, the set 
of all states stabilized by S is a vector space. 

We shall see shortly that stabilizers are especially interesting if they are 
a subgroup of the Pauli group Vn, because then, the action of certain uni¬ 
tary operations of importance in quantum information can be easily tracked. 
Hence, it is only these stabilizers which are of interest in the stabilizer formal¬ 
ism. But first observe the following 
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Lemma 2.1 A subgroup S C Vn of the Hermitean (and unitary) operators acting 
on hi = (C 2 )® N with m independent generators that all commute with each other 
and do not include —I stabilizes a vector space of dimension 2 N ~ m . 

Proof: See [NCOO]. 

For N independent generators, the stabilized subspace is one-dimensio¬ 
nal and contains only one normalized state, and it is such states that we deal 
with within this part of the thesis. 

Definition 2.3 A state |S) G (C 2 ) ON is called a stabilizer state if there is a sub¬ 
group of the Pauli group Vm which stabilizes |S) and only |S). 

The most important application of the stabilizer formalism, quantum er¬ 
ror correction, also deals with stabilized subspaces of more than just one state, 
but for the purpose of the following chapters, we do not need this. 

The relevant feature of stabilizer states is that they can be represented by 
any set of generators of their stabilizer group. The stabilizer group of an N- 
qubit stabilizer state is always generated by N independent generators, and 
each of these is (because we are within the Pauli group) a tensor product of 
N Pauli matrices, with a prefactor £ G {±1, ±z} . Actually, £ has to be ±1, 
because in the case of £ = ±z, the operator's square would be the negative 
identity —I, which cannot be in any stabilizer group. 

After the following two definition, we can see why stabilizer states are so 
useful. 

Definition 2.4 Conjugation of an operator a under a transformation U means the 
operation 

u' = If £710 

Given a subgroup G of a larger group Q, the normalizer of G is the set of all trans¬ 
formations in Q that map under conjugation every element ofG onto another (or the 
same) element of G: 

Normalizer of G := ju G G | Vcj G G : UaU f G G j 

Note that a normalizer is always a group. In the following, we will con¬ 
centrate on the normalizer of the Pauli group. 

Definition 2.5 The normalizer of the n-qubit Pauli group is called the N-qubit Clif¬ 
ford group C\c 

C„ := jlf G SU(n ) | Vn G V n : UaU f G P„} (2.2) 

It is known (for a proof see [NCOO]), that the Clifford group can be gener¬ 
ated by three operators, which can be chosen rather widely: Most choices of 
two local (i. e., acting upon just one qubit) Clifford operators, and one two- 
qubit Clifford operator will generate the full Clifford group. 1 A commonly 

1 Strictly speaking, we need these generators acting on each qubit or pair of qubits. 
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taken set of generators is the Hadamard transform H, the phase rotation S 
and the controlled-not AX: 



1 

1 


AX = 



/ 1 0 0 0 \ 

0 10 0 

0 0 0 1 

V o 0 1 0 / 


0 

i 


The Clifford group contains several important gates, such as all the Pauli 
gates, the controlled phase gate AZ = diag(l, 1,1, —1), and, of course, the 
gates just listed as generators. In fact, one only needs to add a single non- 
Clifford gate, e. g. the Toffoli gate 2 , to obtain a fully universal gate set [Sho96], 
i. e., any unitary can be approximated arbitrarily well by a concatenation of 
polynomially many gates from the set. 3 

Given a stabilizer state |S) and a set of N Pauli operators y, that generate 
its stabilizer group S C V n , we can easily track how the state changes un¬ 
der the action of any Clifford operation U £ C n . We simply conjugate each 
generator with U, and —because U is from the normalizer of Vn — get a new 
set of N Pauli operators UgjU f E Vn that generate a group S' C Vn that 
stabilizes the state |S') = U |S). Is the set of generators {UgjU f \i = 1,..., N} 
now a sufficient and useful description of the state |S')? It is sufficient, as |S') 
is the only state being stabilized by S'. The description is also useful because 
—as we shall see next— we can easily calculate the expectation value for any 
observable out of Vn, which allows for a complete "tomography" of the state. 

Furthermore, the description is succinct: The Pauli group Vn has 4 N+1 
elements (at each of the N sites, one of the three Pauli matrices or the iden¬ 
tity, and one of {1, i, — 1, — i} as overall prefactor), and specifying an element 
within a suitable enumeration of Vn needs thus 2 N + 2 bits. 4 The information 
needed to specify all generators is hence (2 N + 2 )N bits, i. e., quadratic in N. 
The usual way to write down this information is in a "stabilizer tableau" of 
N rows, each for one generator, and N + 1 columns, the left-most one for the 
prefactor (the sign "+" or "—"), and the others containing one of I, X, Y, Z to 


2 The Toffoli gate is a controlled controlled NOT gate, i. e., the third operand qubit gets 
flipped iff the first and the second qubit are both 1. The gate was introduced in the context of 
Toffoli's studies of reversible classical computation [Tof80]. 

3 This last fact is quite important for fault-tolerant quantum computing, as we shall see in 
more detail in Ch. 4: So-called CSS codes [CS96, Ste96b], which can easily be described with 
stabilizers, allow for a straight-forward implementing of gates in the Clifford group by simply 
letting them act in parallel ("transversal gates") [Got98b], One needs to add one further gate 
to make universal quantum computation possible, and implementing the action of this gate 
onto the encoded qubits in a fault-tolerant way is the difficult part. An overview of different 
choices can be found e. g. in [Ste98b] 

4 Actually, only 2N + 1 bits: The prefactor is always +1 or -1, as a Pauli operator with 
prefactor ±z cannot be among the stabilizer generators of a valid stabilizer state. 
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indicate the operator acting on the corresponding qubit. (See Fig. 3.1a on p. 
43 for an example.) 

We shall now see that it is possible to efficiently simulate a quantum com¬ 
putation by classical means if the computations stays in the space of stabilizer 
states. Typically, the quantum register is initialized to |0) 1 jN at the beginning 
of the computation. This state has a straight-forward representation as a sta¬ 
bilizer tableau. When a unitary operation is performed, the stabilizer tableau 
can be updated to reflect the change iff the operation is in the Clifford group, 
as we have seen above. 

If one wishes to measure the expectation value of an observable O £ TV, 
one may proceed as follows: The group S does not change if one of its gen¬ 
erators gj is replaced by the product of gj with another generator gj (j f i). 
When carrying out such a multiplication, one notices that it is an operation 
much akin to that of adding two rows in a linear equation system, and in 
fact, it is easy to see that one can perform an algorithm completely analogous 
to Gaussian elimination in order to bring the stabilizer tableau into trian¬ 
gular form. This then allows to read off the expectation value of the Pauli 
observable. In this context, it is also worth noting that the expectation value 
is always either 0, 1, or -1, corrsponding to probabilities for each of the mea¬ 
surement outcomes of 0,1, or Vi. This already hints at the major reason why 
stabilizer states are a rather special kind of states that is unsuitable to rep¬ 
resent generic quantum states. Especially, a quantum computer restricted to 
containing only stabilizer states within its quantum register cannot do any 
calculations efficiently that an ordinary ("classical") computer could not do 
as well, because its action can be tracked and simulated by classical means in 
the manner just sketched. This result is usually called the Gottesman-Knill 
theorem ([Got98a], see also [NC00]) and maybe summarized as follows: 

Theorem 2.1 (D. Gottesman and E. Knill) A quantum circuit using only the fol¬ 
lowing elements (called a stabilizer circuit) can be simulated efficiently on a classical 
computer: 

• preparation of qubits in computational basis states 

• quantum gates from the Clifford group 

• measurements in the computational basis 

The power of this theorem stems from the fact that many protocols 
in quantum information theory use only Clifford gates. This includes, 
most importantly, the techniques for quantum error correction, namely the 
Calderbank-Shor-Steane (CSS) codes (including the fault-tolerant scheme for 
all stabilizer codes found by Gottesman [Got98b]), and most entanglement 
purification protocols. 

Such a simulator, using the procedure sketched above, needs space of 
0(N 2 ) to keep track of the stabilizer tableau, and time of 0(kN) to update 
the tableau in order to reflect the action of a fc-qubit Clifford operation on all 
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N stabilizer generators. For a measurement, a Gaussian elimination has to be 
performed, which needs time 0(N 3 ) [TB97]. 

In Ref. [AG04], Aaronson and Gottesman found that the measurement 
can be simulated in only quadratic instead of cubic time. They achieve this 
by keeping track not only of the tableau of stabilizers but also of a second 
data structure of the same size that they call the "destabilizer tableau". Im¬ 
proving on this, we have found a way to use so-called graph states in order 
to simulate even faster, at least under favorable conditions. Our technique, 
which is presented in Chapter 3, has a worst case scaling of 0(N 2 ) as well, 
but may reach 0(N log N) under certain conditions which can be expected 
to be fulfilled when simulating entanglement purification or quantum error 
correction techniques. These two application are the subject of Chapters 4 
and 5. 

But first, we need to study the local Clifford group and its role in relating 
stabilizer states and graph states. This is the topic of the following section. 


2.2 The local Clifford group 

2.2.1 Graph states and LC equivalence 

A special subgroup of the stabilizer states are the graph states, defined as 
follows: 

Definition 2.6 Given an undirected mathematical graph G = (V,E) with vertex 
setV = { 1,..., N} and edge set E, the graph state associated with G is the stabilizer 
state |G) E (C 2 r N that is stabilized by the following operators (generators of the 
stabilizer group): 

K a := X (fl) Yl Z{b) ' aeV - 

b:{a,b}cE 


Here, {a, b} G E means that the qubits a and b, or rather the corresponding 
vertices a and b of the graph G, are connected by a graph edge, and X, Y, Z 
denote the Pauli matrices. 

The stabilizer tableau of a graph state can be immediately constructed 
from the graph's adjacency matrix: 5 The sign column contains only "+1", 
the diagonal of the tableau has X at every place, each tableau operator at the 
same place as a 1 in the adjacency matrix is a Z, and all the other operators 
are I. 

Definition 2.7 Two states \ f>i) ,\tpi) £TL = are called local-unitary (LU) 

5 The adjacency matrix A of a graph with N vertices is an N x N matrix with A ab = 1 if 
{a, b} 6 £ and = 0 if {a, b} £ E or a = b. 
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equivalent if there are local unitary operators Ui, ..., U n G SU( 2) such that 



If this equation can even be fulfilled with local Clifford operators, U\, ..., Un G C\, 
then \ipi) and \xp 2 ) are called local-Clifford (LC) equivalent. 

As Ci C SU( 2), LC equivalence trivially implies LU equivalence. The 
reverse seems to be naturally the case at least for stabilizer states. This 
conjecture, namely, that two LU-equivalent stabilizer states are always 
LC-equivalent, as well, has been studied exhaustively in the literature. Even 
though no counterexample has been found so far, a proof has been found 
neither, and it can hence be found as entry 28 on Werner's list of open 
problems in quantum information theory [Wer] (added there in 2005 by 
D. Schlingemann). However, for a rather large subclass of the stabilizer 
states, the conjecture has been proved [NDM05b], and further progress is 
reported in Refs. [ZCCC07, GN07], (Invariants that may help with further 
enlargement of the subclass are obtained in [NDM05a].) Furthermore, two 
states given by their stabilizer tableaus can be tested for LC equivalence with 
an efficient algorithm [NDM04a], 

Not only is any graph state LC-equivalent to a stabilizer state, but we can 
also find to any stabilizer state an LC-equivalent graph state. This may be 
done using a certain diagonalisation procedure on the stabilizer tableau de¬ 
scribed in Ref. [NDM04b], It makes possible an alternative description of a 
stabilizer state: Instead of representing the stabilizer state |S) with its stabi¬ 
lizer tableau, we represent it by the graph G associated with an LC-equivalent 
graph state, and a list of the N local Clifford operators U a G C\ (a = 1,..., N) 
that have to be applied onto the qubits of the graph state to get the stabilizer 

state, i. e. such that |S) = f =] U a ) |G). An example is shown in Fig. 2.1. 



In order to work with this representation, it is necessary to have an enu¬ 
meration of all the elements of the local Clifford group C\, a multiplication 
table and other knowledge about it. This is provided in the following. 

2.2.2 Enumeration of the local Clifford group 

Each element U G C\ maps under conjugation any of the elements u G V\ = 
{±1, ±i} • {I, X, Y, Z} of the local Pauli group onto another or the same el¬ 
ement of V\, cr' = Uaif G V\, and this in a bijective manner. Due to the 
linearity of U, the mapping stays bijective even if we disregard the phases 
and just consider the mapping \a\ i—> \UcrU f \ (where |cr| is the operator out 
of {I, X, Y, Z} that differs from j G V\ only by a global phase). Hence, each U 
corresponds to one of the 6 permutations of {X, Y, Zj. (I does not participate 
in the permutation, as LHLf + is always I.) We shall label these 6 permutations 
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2. Stabilizer states and Clifford operations 



Figure 2.1: Representing the encoded |+) state of the 7-qubit Steane code, 
which is a stabilizer state, by means of a graph. If |G) denotes the graph state 

associated with the depicted graph, then |+) = U^j |G), where the 

local Clifford operators U a are either I or H, as indicated in the figure. 


A: 


Y 


X 



© z 




D: 
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X 





Figure 2.2: The 6 permutations of the 3 Pauli matrices. Each permutation is 
labelled with a letter from A to F. The signs in circles indicate the sign of the 
permutation. 
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(i. e., the 6 elements of the symmetric group S 3 ) by letters /I to F as shown 
in Fig. 2.2. If we now take signs into account, we see that for a G {X,Y,Z}, 
UcrU f = ±\UaU f \, i.e. the conjugation can only cause a minus sign, but not 
a phase of ±z. This is because |c| is Hermitean, and U (as a unitary operator) 
preserves Hermiticity, and while — |c| is Hermitean, ±z‘|cr| are not. Hence, 
in order to characterize the action of an operator U G C\ with respect to its 
conjugation of elements of V\, we need to specify (i) the permutation r G S 3 
of V\ that it performs and (ii) for which of X, Y, Z, a minus sign is introduced. 
For (ii), it suffices to only specify the sign for the conjugation of X and Z, as 
this implies the rest: If U performs the permutation r and the signs for X 
and Z are fx ,£z G {-1,1}, i.e., UXU f = £ x t(X) and UZU f = £ z r(Z), this 
implies that 

UYU f = = -i t™™*] 

= = = frT(r) , 

and as [r(X),r(Z)] /2 is —depending on r— either z'r(Y) or —zr(Y), £y — 
±1 can be deduced from T,x, 0 and T. 

As there are 6 choices of permutations and 4 choices for assigning the 
signs £x = ±1 and £ z = ±1, the local Clifford group has 6 • 4 = 24 elements. 
This is true, however, only after we take care of one subtlety: With U G C\, 
U' = e“ p U with real cp is a local Clifford operator, too, and hence, for any 
assignment of r, and £ z , there is an infinite number of operators in SU ( 2) 
corresponding to this mapping. However, as we are only interested in the 
action of U or LI' when conjugating elements of Vn, we can consider all the 
U' = e" p U as representative of the same operation. After all, under the con¬ 
jugation U'crU ,f , the phase e lcp of U’ cancels against the phase e~ lcp of lf /+ . In 
other words, as long as we are not interested in the action of local Clifford 
operators when applying them onto states from C 2 but only when conjugat¬ 
ing elements of Vn, we may disregard any global phase e Uf> or, speaking more 
technically, factor out the group 11(1) of global phase rotations. Hence, from 
now on, we shall understand the symbol C\ to stand for the factor group 

Cl := {U G SU{ 2) I UcrU f G Pi Vn G V N } / U(l). (2.3) 

This amends our earlier definition (2.2), which we now regard as having 
been a bit "sloppy" in this respect . 6 

^Another earlier remark should be restated more precisely as well: The Clifford group is 
usually said to be generated by H, S and AX. For the local Clifford group, we only need H 
and S, of course. If one now enumerates all distinct results from repeated multiplication of 
the matrices H and S, one does not get 24 matrices, but 8 times as many, namely 192. This is 
because two different products of generators may produce the same Clifford operator but with 
different phases. Nevertheless, not all possible (uncountably many) phases are generated, but 
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Table 2.1: The 24 elements of the local Clifford group. The row index (here called the "sign symbol") shows how the operator 
U permutes the Pauli operators a = X,Y,Z under the conjugation a' = ±L/<rl/ + . The column index (the "permutation 
symbol") indicates the sign obtained under the conjugation: For operators U in the I column it is the sign of the permutation 
(indicated on the left). For elements in the X, Y and Z columns, it is this sign only if the conjugated Pauli operator is the one 
indicated by the column header and the opposite sign otherwise. 
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For each of the 24 elements of C\, we may give a matrix representation. 
Table 2.1 gives 24 matrices that shall from now on be our canonical repre¬ 
sentatives of the elements of C\. Of course, each of these matrices may be 
replaced by another one that differs from it only by a global phase. 

In this table, the representatives (or their phases) are chosen such that the 
following convention holds: First, some of the operators are Hermitean, as 
can be seen from the fact that some of their matrices are Hermitean. It is of 
course desirable to represent a Hermitean operator by a Hermitean matrix, 
so that we choose only among those, if applicable. Furthermore, the Pauli 
matrices should be included in their canonical form of Eq. (2.1), satisfying 
the canonical commutation relations of the Pauli matrices. To ensure this, 
we proceed as follows: We select that matrix that has a positive real value in 
its upper left corner. If there is no such matrix (which happens if the entry 
vanishes), we try to make the upper right-hand matrix entry positive and 
real. Failing that (because it would select a non-Hermitean matrix, though 
there would be an Hermitean one), we require the upper right-hand entry to 
lie in the fourth quadrant of the complex plane, boundaries included. 

The table also sorts these matrices by labelling them with two symbols, a 
permutation symbol out of S 3 = {A, (row headings) and a sign symbol 

out of V = {I,X,Y,Z} (column headings). We will write the permutation 
symbol as superscript to the sign symbol, e. g. Y c , to specify a local Clifford 
operator, and write for a general local Clifford operator 

z 71 , LeP, ? re s 3 . 

The permutation symbol indicates how the Clifford operator permutes 
{X, Y,Zj, as already explained above. Hence the conjugation of a Pauli op¬ 
erator j G V\ by a local Clifford operator Y n can be written in general as 
follows: 

Y n • (7 • (E 7r ) + = £/r(n) (2.4) 

where n{a) is the result of permutation n acting on a and £ is the sign factor, 
to be discussed next. 

As we have seen already, the global phase brought about by the conjuga¬ 
tion, here denoted £, is always either +1 or —1, never non-real. The reason 
for this is that conjugation under a unitary matrix preserves Hermiticity, i. e. 
Y n a (Z 7r ) + is Hermitean as is the Pauli operator a. But tt(lt) on the r. h. s. of 
Eq. (2.4) is Hermitean as well, and this forces £ = f *, i. e. £ = ±1. 

Recall that for two of three Pauli operators r in Eq. (2.4), we can choose 
the sign given by £. But this then fixes the sign for the third Pauli operator, 
as it can be expressed as product of the other two. Hence, per permutation, 
there are 4 possible sign choices. 

only the matrices shown in Table 2.1 and their multiples by the phase factors e lkn (k = 
1,..., 7). Hence, with the definition of Eq. (2.3), we have 
(H,S) = {e' kn ^\k = 0,..., 7} ■ Ci- 
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Table 2.2: Standard symbols for commonly used local Clifford operators. The 
expressions with square roots correspond to those used in Ref. [HEB04], The 
canonical generators of the group are shaded. 
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z c 

X D 
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X B 

Y c 


z 

~Y 

z B 

x c 

y d 

X F 
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Table 2.3: The Hermitean adjoints of the operators. Self-adjoint (i. e. Her- 
mitean) operators are shaded. 


As already mentioned, we label the 4 sign choices by sign symbols from 
V. These mean that the sign is chosen in the same way (for even permuta¬ 
tions) or the opposite way (for odd permutation) as if E 71 were a Pauli opera¬ 
tor. 

In other words: Which sign appears in Eq. (2.4) is indicated by the sign 
label, which is to be interpreted according to the following rule: If the sign 
symbol Z is I or the same as cr, then the sign £ of the conjugation E n cr (E 71 )* 
is the same as the sign of the permutation cr, otherwise it it the opposite sign: 

^ _ f + sgn j iff Z = I or E = a . . 

( — sgn a otherwise 

As we have now assigned a unique symbol to each operator, we can iden¬ 
tify the ones often used. Table 2.2 shows alternative symbols used in other 
literature for common operators. 

Table 2.3 shows which of the operators are Hermitean, and also gives the 
Hermitean adjoints of the other operators. 
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left operand 

right operand 

Z r = I 

Z r - X, Y, Z 

z t = I 

I 

Z, 

Z/ = X,Y,Z 

^00/) 

f 

Zr = 7t-\ Z 2 ) 

wo sub-cases: 

z r / Z,) 

I 

the only element in 
{X,Y,Z}\{Z ,.,/r0(Z/)} 


Table 2.4: The sign symbol Z of the product Z 77 = Z / 7 r, Zj2 r is determined ac¬ 
cording to the rules given in this table. 

2.2.3 Multiplication 

As C\ is a rather small finite group, its multiplication table can be given ex¬ 
plicitly. It can be found either by explicit multiplication of the matrices in 
Table 2.1 or systematically using the rules developed in the following. Ta¬ 
ble 2.5 shows the multiplication table, sorted first by the permutation and 
then by the sign symbol. The same information, sorted by sign symbol first, 
is provided in Table 2.6. 

The former table exhibits a block structure due to the fact that under mul¬ 
tiplication, the permutations are simply concatenated. 

We shall now develop a general rule to multiply operators in C\ without 
resorting to the multiplication table. Consider the following general product: 

Zf'Z^ = Z 77 . (2.6) 

The permutation symbols obviously "transport" the group structure of the 
symmetric group S 3 to C\, and hence n is obtained simply by concatenating 
the permutations tcj after n r : 


n = Tii o n r 


For the sign symbols, one has to consider four cases. The rules for these cases 
are given in Table 2.4 and proved in the subsection 2.2.4. 

As a corollary, we observe that every local Clifford operator Z 71 can be 
decomposed into a permutation I n and a Pauli operator Z /l = Z: 

Z 77 = riL (2.7) 


2.2.4 Proof of multiplication rules 

This technical subsection proves the rules given in Table 2.4. 
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Table 2.5: The multiplication table of the local Clifford group. 
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2. Stabilizer states and Clifford operations 


N 

X 

X 

l-H 

N 

X 

X 

1—1 

N 

X 

X 

i—i 

N 

X 

X 


N 

X 

X 

l-H 

N 

X 

X 

l-H 

o 

aT 

hi 

hi 

hi 

hj 

hi 

hi 

hi 


a 

a 

a 

CD 

n 

n 

n 

o 

03 

03 

03 


it. 

>> 

it. 


<- 


N 

X 

X 

i—i 

N 

X 

X 

1—1 

N 

X 

X 

i—i 

N 

X 

X 

1—1 

N 

X 

X 

--- 

N 

X 

X 

1—1 

i—i 


hi 

hi 

hi 


hi 

hi 

hi 


a 

a 

a 

CD 

n 

n 

n 

n 

03 

03 

03 


it. 

Sb 

it. 

Jb 

£b 


x 

N 

i—i 

X 

X 

N 

l-H 

X 

X 

N 

l-H 

X 

X 

N 

l-H 

X 

X 

N 

l-H 

X 

X 

N 

1—1 

X 

X 


h 

hi 


hi 

hi 

hi 


hi 

a 

a 

CD 

o 

n 

n 

n 

O 

03 

03 


03 



Jb 

it. 

it. 


X 

i—i 

N 

X 

X 

>-h 

N 

X 

X 

i—i 

N 

X 

X 

i—i 

N 

x 

X 

i—i 

N 

X 

X 

l-H 

N 

X 

X 


h 


hi 

hi 

hi 


hi 

hi 

a 

CD 

a 

a 

o 

n 

n 

o 

03 


03 

03 


^b 

it. 

it. 

it. 


1-- 

X 

X 

N 

i—i 

X 

X 

N 

1—1 

X 

X 

N 

l—H 

X 

X 

N 

i—i 

X 

X 

N 

1—H 

X 

X 

N 

N 



hi 

hi 

hi 

hi 

hi 

hi 

hi 

CD 

a 

a 

a 

o 

n 

n 

o 

03 

03 

03 

03 



it. 

it. 

it. 


N 

X 

X 

l-H 

N 

X 

x 

i—i 

N 

X 

X 

i—i 

N 

X 

X 


N 

X 

X 

i—i 

N 

X 

X 

l-H 

1—1 


a 

a 

a 

CD 

n 

O 

o 

O 

hi 

hi 

hi 

hi 

hi 

hi 

hi 


it. 

it. 


£b. 

03 

03 

03 




X 

i—i 

N 

X 

X 

i—i 

N 

X 

X 

i—i 

N 

X 

X 

i—i 

N 

X 

X 

►-H 

N 

X 

X 

l-H 

N 

X 

X 


a 

CD 

a 

a 

O 

o 

n 

n 

hi 


hi 

hi 

hi 


hi 

hi 





03 


03 

03 

03 


X 

N 

i—i 

X 

X 

N 

i—i 

X 

X 

N 

i—i 

X 

X 

N 

l-H 

X 

X 

N 

1—1 

X 

X 

N 

i—i 

X 

X 


o 

a 

CD 

a 

O 

O 

o 

O 

hi 

hi 


hi 

hi 

hi 


hi 

it. 

it. 

Sb- 


03 

03 


03 

03 


i—i 

X 

X 

N 

i—i 

X 

X 

N 

i—i 

X 

X 

N 

i—i 

X 

X 

N 

Hi 

X 

X 

N 

l-H 

X 

X 

N 

N 


cd 

a 

a 

a 

o 

O 

O 

n 

hi 

hi 

hi 

hi 


hi 

hi 

hi 


it. 




03 

03 

03 

03 


X 

X 

N 

i—i 

X 

X 

N 

►-H 

X 

X 

N 

i—t 

X 

X 

N 

i—i 

X 

X 

N 

l—1 

X 

X 

N 

l-H 

l-H 


03 

03 

03 


a 

o 

a 

CD 

hi 

hi 

hi 


b 

Jb 

it. 


hi 

hi 

hi 


n 

n 

n 

n 

n 


i-h 

N 

X 

X 

i—i 

N 

X 

X 

i—i 

N 

X 

X 

i—, 

N 

X 

X 

i—i 

N 

X 

X 

i—i 

N 

X 

X 

X 



03 

03 

03 

CD 

a 

a 

a 


hi 

hi 

hi 

b 

Jb 

it. 



hi 

hi 

hi 

n 

n 

n 

O 

n 


N 

i—i 

X 

X 

N 

l-H 

X 

X 

N 

i—i 

X 

X 

N 

l-H 

X 

X 

N 

i—i 

X 

X 

N 

i—i 

X 

X 

X 


03 

03 

03 

03 

a 

o 

a 

a 

hi 


hi 

hi 

b 

Jb 

it. 

it. 

hi 


hi 

hi 

o 

o 

n 

O 

n 

i-i 

X 

X 

i—i 

N 

X 

X 


N 

X 

X 

•——H 

N 

X 

X 

1—1 

N 

X 

X 

►—H 

N 

X 

X 

1—H 

N 

N 

crq 

03 

03 


03 

a 

o 

a 

a 

hi 

hi 


hi 

b 

!b 


it. 

hi 

hi 

hj 

hi 

n 

n 

n 

n 

n 

o 

X 

N 

X 

i-h 

X 

N 

X 

i—i 

X 

N 

X 

i—i 

X 

N 

X 

--1 

X 

N 

X 

l-H 

X 

N 

X 

i—i 

i—i 

0 

ro 

O 

n 

O 

n 

03 

03 

03 

03 

£b 

it. 


£b 

hi 

hi 

hi 

hi 

hi 

hi 

hi 


a 

a 

a 

C3 

CD 


N 

X 

►-h 

X 

N 

X 

i—i 

X 

N 

X 

>—1 

X 

N 

X 


X 

N 

X 

►—H 

X 

N 

X 

i—i 

X 

X 


n 

n 

o 

O 

03 

03 


03 

£b 

it. 

£b 

lb 

hi 

hi 

hi 

hi 

hi 

hi 


hi 

a 

a 

CD 

a 

a 


i—i 

X 

N 

X 

i—i 

X 

N 

X 

1—l 

X 

N 

X 

i—i 

X 

N 

X 

i—i 

X 

N 

X 

l-H 

X 

N 

X 

X 


o 

O 

n 

O 


03 

03 

03 

£b 

it. 

it. 

b 

hi 

hi 

hi 

hi 


hi 

hi 

hi 

CD 

a 

a 

o 

a 


X 

l-H 

X 

N 

X 

i-h 

X 

N 

X 

I-H 

X 

N 

X 

i—i 

X 

N 

X 

l-H 

X 

N 

X 

l-H 

X 

N 

N 


O 

n 

n 

o 

03 


03 

03 

£b 


£b 

b 

hi 


hi 

hi 

hi 


hi 

hi 

a 

CD 

a 

a 

a 


X 

X 

N 

l-H 

X 

X 

N 

i—i 

x 

X 

N 

1—l 

X 

X 

N 

i—i 

X 

X 

N 

i—i 

X 

X 

N 

l-H 

l-H 


£b 

it. 

it. 


hi 

hi 

hi 

hi 

n 

O 

n 

n 

03 

03 

03 


a 

a 

a 

CD 

hi 

hi 

hi 




N 

1—1 

X 

X 

N 

i—i 

X 

X 

N 

i—i 

X 

X 

N 

l—i 

X 

X 

N 

i—i 

X 

X 

N 

i—i 

X 

X 

X 


£b 


it. 


hi 


hi 

hi 

o 

o 

o 

n 

03 


03 

03 

a 

CD 

a 

a 

hi 


hi 

hi 

hi 


1 — 1 

N 

X 

X 

i — i 

N 

X 

X 

1 — 1 

N 

X 

X 

i — i 

N 

X 

X 

i — i 

N 

X 

X 

l-H 

N 

X 

X 

X 


Jb 

it. 

it. 


hi 

hi 

hi 

hi 

n 

o 

o 

n 

03 

03 

03 

03 

CD 

a 

a 

a 

hi 

hi 

hi 

hi 

hi 


X 

X 

1 — 1 

N 

X 

X 

l-H 

N 

X 

X 

i — i 

N 

X 

X 

i — i 

N 

X 

X 

i— i 

N 

X 

X 

i — i 

N 

N 


Sb 

it. 



hi 

hi 

hi 

hi 

n 

n 

o 

O 

03 

03 


03 

a 

a 

CD 

a 

hi 

hi 


hi 

hi 


X 

N 

X 

l-H 

X 

N 

X 

>-h 

X 

N 

X 

i — i 

X 

N 

X 

i —i 

X 

N 

X 

i — i 

X 

N 

X 

i —i 

l-H 


hi 

hi 

hi 

hi 

it. 


i* 


03 

03 

03 

03 

a 

a 

a 

CD 

n 

o 

n 

o 

hi 

hi 

hi 

hi 

hj 


i — i 

X 

N 

X 

>—i 

X 

N 

X 

i — i 

X 

N 

X 

i — i 

X 

N 

X 

i — i 

X 

N 

X 

l-H 

X 

N 

X 

X 



hi 

hi 

hi 

£b 

it. 

it- 

it. 


03 

03 

03 

a 

a 

a 

a 

n 

o 

n 

o 

hi 

hj 

hi 

hi 

hi 


N 

X 

l-H 

X 

N 

X 

1 — 1 

X 

N 

X 

l—H 

X 

N 

X 

t-H 

X 

N 

X 

i — i 

X 

N 

X 

i — i 

X 

X 


lm 

hi 


hi 

it. 



it. 

03 

03 


03 

a 

a 

CD 

a 

n 

o 

n 

n 

hi 

hi 


hi 

hi 


X 

>-h 

X 

N 

X 

l-H 

X 

N 

X 

l— i 

X 

N 

X 

i — i 

X 

N 

X 

1—H 

X 

N 

X 

i — i 

X 

N 

N 


hi 


hi 

hi 

it. 


it. 

it. 

03 

03 

03 

03 

a 

CD 

a 

a 

O 

o 

n 

n 

hi 


hi 

hi 

hi 

























right operand 


2.2. The local Clifford group 
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Table 2.6: The same multiplication table as in Table 2.5 but with the group elements sorted in a different way. 
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2. Stabilizer states and Clifford operations 


To derive the multiplication rules for the sign symbols as given in Ta¬ 
ble 2.4, we write the conjugation (cf. Eq. (2.4)) as follows: 

EVE* + = 00071(0, (2.8) 

where 0 , n is a function that maps its argument a onto the appropriate sign 
required for a conjugation of u under E 71 according to Eq. (2.5). This latter 
equation is rewritten as: 

000 = - (-10(-l)*>sgn 7 T = (-^©^©AABsgnTr, (2.9) 

where 5 is the Kronecker symbol and © denotes addition modulo 2. 
Considering a product 

E n = E 0 E 0 

we examine the action of a conjugation under the product: 

EVE 71 * = E^E^VEf + Ef + = 0 /7ri (0E0TTi(0E0 + 

= 0 , 7r 2 ( n 1 (0 ) 0 /7ri (0 n 2 ( n x (0 ) 

As this is equal to Eq. (2.8), we find not only 

7T — 7T2 O Tl\ 

but also 

0(0 = 0712 0 ( 0 ) 0 , 700 - 

If we rewrite this last equation using the expansion Eq. (2.9), we can equate 
the exponents of ( — 1 ) on both sides: 

1 © 0,1 © 0,cr = 0 2 ,/ © 0,1 © 0-i(X 2 ) /Cr © 0 1/( r Vcr G {X, Y, Z} (2.10) 

(Note that we used 0 2<7ri ( tr ) = S n -irz 2 \ a and cancelled sgn n against 

sgn 7r 2 sgn Tt\.) 

Now, we can consider the 4 cases: 

Case 1: Ei = E 2 = I — In this case, the r. h. s. of Eq. (2.10) is 0 for all cr. 
The 1. h. s. can only be 0 for all a if E = 1. 

Case 2: Ei 7 ^ I, E 2 = I — Here, Eq. (2.10) takes the form 1 © 0 /fr = 
1 © 0,(77 requiring E = E |. 

Case 3: E| — /, E 2 / / — Analogous to the previous cases, we obtain 
E = 7100). 

Case 4: Ei f I, E 2 f I — Eq. (2.10) reads 

1 © 07 © 0,cr = 1 © 0-l(£ 2 ) <c r © 0i ,<r V(7 G {X, Y, Z}. 

We distinguish two sub-cases: 

Case 4a: zr0 (Z 2 ) = Ei — The r. h. s. of Eq. (2.2.4) is 0 for all 3 values of cr, 
forcing E = I. 

Case 4b: 7100) f Ei — If a is either nf l {E 2 ) or Ei, the r. h. s. of 
Eq. (2.2.4) is 1. Hence, E can be neither I nor this cr. So it has to be the 
one element that remains in {X, Y, Z} after one has taken out 70 (Z 2 ) and 
Ei. With this assignment the equation holds for all 3 values of a. □ 
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Figure 2.3: The action of the SO(3) equivalent to C\ onto an arbitrarily chosen 
Bloch vector. This means that all concatenations of the following generators 
are shown: The rotation, S, by 90° around the z axis (red line) and the reflec¬ 
tion, H, off the green circle, indicating the half-angle plane between x and 
z axis. Note that this picture shows twice as much symmetry operations as 
there are in C \. 


2.2.5 Action on the Bloch sphere 

It is often helpful to visualize qubits as points on the Bloch sphere: A qubit in 
the pure state a |0) + f> |1) =: (a, f ) T is represented by the Bloch vector 


v = 


sin & cos cp 
sin & sin cp 
cos & 


( 2 . 11 ) 


with 


& = arg —, 
a 


cp = 2 arctan 


t 

oc 


( 2 . 12 ) 


So, for example, the eigenstates of Z, i. e. |0) = (1,0) T and |1) = (0,1) T 
are mapped onto Bloch vectors (0,0,1) T and (0,0, — l) r , and the eigenstates 
of X, namely |±) = ^(1, ±1) T , become (±1,0,0) r . The remaining two axis- 
parallel unit vectors, (0, ±1,0) correspond to the eigenstates of Y, i(|0) ± 

i|l» = ^l,±i) r . 

Via this mapping we can interpret the action of the local Clifford operators 
as symmetry transformations on the Bloch sphere, i. e. as rotations and/or 
reflections of the Bloch vector that corresponds to the state they act on. 
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Figure 2.4: The action of the 24 local Clifford operators onto an arbitrarily 
chosen Bloch vector, namely the orange one. The green circles mark the axes 
planes. The back side of the sphere is shown in Fig. 2.5 


The effect of the Pauli operators X, Y and Z is seen readily: They simply 
rotate the Bloch vector by 180° about the x, y, and z axis. The S operator, 
being a square root of Z, is hence a rotation about the z axis by 90°, where the 
direction of rotation is according to the right hand rule. 7 This explains the 
name "phase rotation" for S, as it is the phase which gets transformed into 
the azimuthal angle & in Eq. (2.12). 

What does the Hadamard operator H do? To see that, we build a corre¬ 
sponding matrix acting on the Bloch sphere surface, i. e. an SO(3) matrix, 
by regarding the action of H onto the three states corresponding to the basis 
vectors in the 1R 3 . These are mapped as follows: 



7 which says that the bent fingers of one's right hand indicate the direction of rotation if one 
points one's outstretched thumb in the direction of the rotation axis. 


0 


0 


0 


0 












0 


0 


0 


0 


2.2. The local Clifford group 


35 



Figure 2.5: The back side of the Bloch sphere of Fig. 2.4 


Hence, the two generators of C\ correspond to the following two SO(3) 
matrices: 

/ 0 0 1 \ _ / 0 1 0 \ 

H= 0 1 0 , S = -1 0 0 

\ 1 0 0 / \ 0 0 1 / 

Thus, the Hadamard transform exchanges x and z coordinates on the 

Bloch sphere. This is in effect a reflection off the half-angle plane between 
the x and the z axis. 

We can now try to visualize the action of all elements of C\ on the Bloch 
sphere. A naive answer would be as follows: We simply concatenate in arbi¬ 
trary orders the two operations rotation by 90° degrees about z (S ) and reflecting 
off the x-z half-angle plane (H), and see which points on the sphere we can reach 
after starting from a fixed point for which we choose one that is generic in the 
sense that is not in fixed sets of the transformation. 

Fig. 2.2.5 shows the result. It shows 48 points, twice as many as expected. 
This is, of course, due to the main difference between SU(2) and SO (3): If one 
rotates a spin-14 particle once by full 360° around an arbitrary axis, one does 
not get back to the same state, but picks up a minus sign. The full rotation is 
not a symmetry operation of the spinor space, but the double rotation (780°) 
is. The SO(3) group, however, describes rotations in ordinary ]R 3 space, and 
hence includes the 360° rotation. Thus, it is to be expected to find the number 
of symmetry operations doubling, when one goes, via Eq. (2.11), from spinor 
space C 2 (handled by SU(2)) to 1R 3 (handled by SO(3)). 

The proper way is to do the described operation in the other order: 
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Choose an arbitrary state 8 and let all the 24 Clifford operators act on it, then 
transform these 24 states into Bloch vectors with Eq. (2.11) and mark them 
on the Bloch sphere. The result is shown in Figs. 2.4 and 2.5. 

2.2.6 Symplectic formalism 

Let us come back to the stabilizer formalism. For certain applications it is 
useful to represent the n stabilizer generators (without their phases) as n vec¬ 
tors out of the 2n-bit Boolean field F,''. (This formalism has been introduced 
in [DM03] and is used e. g. in [NDM04b].) In these vectors, a "1" placed at 
position j and a "0" at position n + j means that the tensor product represen¬ 
tation of the Pauli operator contains an X acting on the ;-th qubit, i.e. the "1" 
in the upper half of the vector codes for X operation. Likewise, the lower half 
codes for Z operations: a "0" at position j and a "1" at n + j means a Z acting 
on qubit j. Consequently, "0" at both positions means the identity, and "1" 
at both positions a combined X and Z, i. e. a Y. Note that all phases of the 
generator matrices are lost when translating to this notation. 

To see the effect of a Clifford operation on a stabilizer state, one has to 
conjugate the stabilizer generators under the Clifford operations. For this 
purpose, the F^ 1 vectors are left-multiplied with a 2 m x 2 n matrix over F 2 
(satisfying a certain symplecticity condition) which represents the Clifford 
operation. 

For a local Clifford operation, acting only on qubit j, this matrix has a 
simple structure: It is equal to the identity matrix, except at the j-th and the 
(n + /)-th row and column. There, it is all zero, except for the four entries at 
the intersection of said rows and columns, which form a 2 x 2 matrix which 
has to be invertible [NDM04b], 

The six 2x2 matrices that are invertible over F 2 correspond to the six 
permutation symbols: 



The sign symbol cannot be mapped to this formalism as all signs are lost. 


8 for the figures, I took, as a generic state. 
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In the next chapter we shall use the concepts introduced in this chapter 
for the purpose of simulating quantum circuits. 
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Abstract 

According to the Gottesman-Knill theorem, a class of quantum circuits, 
namely the so-called stabilizer circuits, can be simulated efficiently on a 
classical computer. We introduce a new algorithm for this task, which is 
based on the graph-state formalism. It shows significant improvement in 
comparison to an existing algorithm, given by Gottesman and Aaronson, in 
terms of speed and of the number of qubits the simulator can handle. We 
also present an implementation. 

[PACS: 03.67.-a, 03.67.Lx, 02.70.-c] 


3.1 Introduction 

Protocols in quantum information science often use entangled states of a 
large number of qubits. A major challenge in the development of such 
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protocols is to actually test them using a classical computer. This is be¬ 
cause a straight-forward simulation is typically exponentially slow and 
hence intractable. Fortunately, the Gottesman-Knill theorem ([Got98a], 
[NCOO]) states that an important subclass of quantum circuits can be 
simulated efficiently, namely so-called stabilizer circuits. These are cir¬ 
cuits that use only gates from a restricted subset, the so-called Clifford 
group. Many techniques in quantum information use only Clifford gates, 
most importantly the standard algorithms for entanglement purification 
[BBP 1 96, DEJ+96, MPP + 98, MS02, DAB03] and for quantum error correction 
[Sho95, Ste96b, CS96, Ste96a]. Hence, if one wishes to study such networks, 
one can simulate them numerically. 

The usual proof of the Gottesman-Knill theorem (as stated e. g. in [NCOO]) 
contains an algorithm that can carry out this task in time 0(N 3 ), where N is 
the number of qubits. Especially for the applications just mentioned, one 
is interested in a large N: For entanglement purification one might want to 
study large ensembles of states, and for quantum error correction concatena¬ 
tions of codes. The cubic scaling renders this extremely time-consuming, and 
a more efficient algorithm should be of great use. 

Recently, Aaronson and Gottesman presented such an algorithm (and an 
implementation of it) in Ref. [AG04], whose time and space requirements 
scale only quadratically with the number of qubits. In the present paper, 
we further improve on this by presenting an algorithm that for typical ap¬ 
plications only requires time and space of O(NlogN). While Aaronson and 
Gottesman's simulator, when used on an ordinary desktop computer, can 
simulate already systems of several thousands of qubits in a reasonable time, 
we have used our simulator for over a million of qubits. This provides a valu¬ 
able tool for investigating complex protocols such as our study of multi-party 
entanglement purification protocols in Ref. [KADB06]. 

The crucial new ingredient is the use of so-called graph states. Graph 
states have been introduced in [BR01] for the study of entanglement proper¬ 
ties of certain multi-qubit systems; they were used as starting point for the 
one-way quantum computer (i. e., measurement-base quantum computing) 
[RBB03], and found to be suited to give a graphical description of CSS codes 
(for quantum error correction) [SW02], Graph states take their name from 
the concept of graphs in mathematics: Each qubit corresponds to a vertex of 
the graph, and the graph's edges indicate which qubits have interacted (see 
below for details). 

There is an intimate correspondence between stabilizer states (the class of 
states that can appear in a stabilizer circuit) and graph states: Not only is ev¬ 
ery graph state a stabilizer state, but also every stabilizer state is equivalent to 
a graph state in the following sense: Any stabilizer state can be transformed 
to a graph state by applying a tensor product of local Clifford (LC) opera¬ 
tions [SchOl, GKR02, NDM04b], We shall call these local Clifford operators 
the vertex operators (VOPs). 
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To represent a stabilizer state in computer memory, one stores its tableau 
of stabilizer operators, which is an N x N matrix of Pauli operators and hence 
takes space of order (D(N 2 ) (see below for details). Gottesman and Aaron- 
son's simulator extends this matrix by another matrix of the same size (which 
they call the destabilizer tableau), so that their simulator has space complex¬ 
ity 0(N 2 ). A graph state, on the other hand, is described by a mathemati¬ 
cal graph, which, for reasons argued later, only needs space of G(NlogN) 
in typical applications. Hence, much larger systems can be represented in 
memory, if one describes them as graph states, supplemented with the list of 
VOPs. However, we also need efficient ways to calculate how this representa¬ 
tion changes, when the represented state is measured or undergoes a Clifford 
gate application. The effect of measurements has been extensively studied in 
[HEB04], and gate application is what we will study in this paper, so that we 
can then assemble both to a simulation algorithm. 

This paper is organized as follows: We first review the stabilizer formal¬ 
ism, the Gottesman-Knill theorem, and the graph state formalism in Section 
3.2. There, we will also explain our representation in detail. Section 3.3 ex¬ 
plains how the state representation changes when Clifford gates are applied. 
This is the main result and the most technical part of the paper. For the sim¬ 
ulation of measurements, we can rely on the studies of Ref. [HEB04], which 
are reviewed and applied for our purpose in Section 3.4. Having exposed all 
parts of the simulator algorithm, we continue by presenting our implementa¬ 
tion of it. A reader who only wishes to use our simulator and is not interested 
in its internals may want to read only this section. Section 3.6 assesses the 
time requirements of the algorithm's components described in Sections 3.3 
and 3.4 in order to prove our claim of superior scaling of performance. We 
finish with a conclusion (Section 3.7). 

3.2 Stabilizer and graph states 

We start by explaining the concepts mentioned in the introduction in a formal 
manner. 

Definition 3.1 The Clifford group Cn on N qubits is defined as the normalizer of 
the Paidi group Vn'- 

C N = {ue su( 2 n ) I UPU f eV N VP e v N }, 

V N = {±1 ,±©{/,X,Y,Z}® n , (3.1) 

where I is the identity and X, Y, and Z are the usual Paidi matrices. 

The Clifford group can be generated by three elementary gates (see e. g. 
[NC00]): the Hadamard gate H, the f phase rotation S, and a two-qubit gate. 
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either the controlled NOT gate AX, or the controlled phase gate AZ: 

1 


H = —j= 

V2 


1 1 
1 -1 


S = 


1 0 
0 i 


AX = 
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0 
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(3.2) 


The significance of the Clifford group is due to the Gottesman-Knill theo¬ 
rem ([Got98a], see also [NCOO]): 


Theorem 3.1 A quantum circuit using only the following elements (called a stabi¬ 
lizer circuit) can be simulated efficiently on a classical computer: 

• preparation of qubits in computational basis states 

• quantum gates from the Clifford group 

• measurements in the computational basis 

The proof of the theorem is simple after one introduces the notion of sta¬ 
bilizer states [Got97]: 

Definition 3.2 An N-qubit state |t p) is called a stabilizer state if it is the unique 
eigenstate with eigenvalue +1 ofN commuting multi-local Pauli operators P n (called 
the stabilizer generators): 

P a \f) = \tp), P a eV N , a = 1,... ,N 

(These N operators generate an Abelian group, the stabilizer, of 2 N Pauli op¬ 
erators that all satisfy this stabilization equation.) 

Computational basis states are stabilizer states. Furthermore, if a Clif¬ 
ford gate U acts on a stabilizer state \ip), the new state U \xp) is a stabilizer 
state with generators LfP,Lf + S . Hence, the state in a stabilizer circuit can 
always be described by the stabilizer tableau, which is a matrix of N x N op¬ 
erators from {I,X,Y,Z} (where each row is preceded by a sign factor). The 
effect of an n-qubit gate can then be determined by updating nN elements of 
the matrix, which is an efficient procedure. 

Instead of on the stabilizer tableau, we shall base our state representation 
on graph states: 

Definition 3.3 An N-qubit graph state |G) is a quantum state associated ivith a 
mathematical graph G = (V,E), zvhose |V| = N vertices correspond to the N 
qubits, while the edges E describe quantum correlations, in the sense that |G) is the 
unique state satisfying the N eigenvalue equations 

K { “ ) \G) = \G), aeV, 

with k { » ] = 4 a) n ^ b) = : x « n ( 3 - 3 ) 

f>6ngbh« fiCngbhu 
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zvherengbha := {b \ {a,b} E E } is the set of vertices adjacent to a [RBB03, BR01, 
SW02], 

The following theorem states that the edges of the graph can be associated 
with phase gate interactions between the corresponding qubits: 

Theorem 3.2 If one starts with the state |+) 0N = Ylnev^a |00.. .0) one can 
easily construct |G) by applying AZ on all pairs of neighboring qubits: 

ig) = ( n Az °b) \°f N (3-4) 

\{a,b}eE ) \aev ) 

(Proof: Insert Eq. (3.4) into Eq. (3.3) [HEB04].) 

As the operators K^ belong to the Pauli group, all graph states are sta¬ 
bilizer states, and so are the states which we get by applying local Clifford 
operators C E C\ to |G). For such states, we introduce the notation 

N 

|G;C) := |G;Ci,C 2 ,...,C n ) :=®Q|G) (3.5) 

i =1 

It has been shown that all stabilizer states can be brought into this form 
[SchOl, GKR02, NDM04b], i. e. any stabilizer state is LC-equivalent to a graph 
state. (We call two states LC-equivalent if one can be transformed into the 
other by applying a tensor product of local Clifford operators.) Finding the 
graph state that is LC-equivalent to a stabilizer state given by a tableau can 
be done by a sort of Gaussian elimination as explained in [NDM04b], 

This is what we shall use to represent the current quantum state in the 
memory of our simulator. Fig. 3.1 shows for an example state the tableau 
representation that is usually employed (and also used by CHP, albeit in a 
modified form) and our representation. The tableau representation requires 
space of order 0(N 2 ). We store the graph in adjacency list form (i. e., for each 
vertex, a list of its neighbors is stored), which needs space of order O(Nd), 
where d is the average vertex degree (number of neighbors) in the graph. We 
also store a list of the N local Clifford operators C\, ..., C\;, which transform 
the graph state |G) into the stabilizer state |G; C). We call these operators the 
vertex operators (VOPs). As there are only 24 elements in the local Clifford 
group, each VOP is represented as a number in 0,..., 23. The scheme to enu¬ 
merate the 24 operators will be described in [-] 1 . Note that we can disregard 
global phases of the VOPs as they only lead to a global phase of the full state 
of the simulator. 

Tn the journal publication, a reference to "S. Anders, in preparation" was given here. I 
regret that this information has not been published separately so far, but can now be found as 
Sec. 2.2 of the present thesis. 
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Figure 3.1: A stabilizer state | ip) represented in different ways: (a) as stabilizer 
tableau, i. e. the state is stabilized by the group of Pauli operators generated 
by the operators in the 4 rows. This representation needs space 0(N 2 ) for N 
qubits, (b), (c) as LC-equivalence to a graph state, (b) shows the graph, with 
the VOPs given by their decomposition into the group generators {H,S}. (c) 
is the data structure that represents (b) in our algorithm. The VOPs are now 
specified using numbers between 0 and 23 (which enumerate the \C\\ = 24 
LC operators). Here, we need space 0(Nd), where d is the average vertex 
degree, i. e. the average length of the adjacency lists. Writing G for the graph 
in (b), we can use the notation of Eq. (3.5) and write \ ip) = \G;H, I, HS,S). 


As we shall see later, we may typically assume that d = Of log N). Hence, 
our representation needs considerably less space in memory than a tableau, 
namely O(NlogN), including 0(N ) for the VOP list. 

The Gaussian elimination needed to transform a stabilizer tableau to 
its graph state representation is slow (time complexity 0(N 3 )), and so we 
should better not use it in our simulator. But usually, one starts with the 
initial state |0) oN , and if we write this state already in graph state form, the 
tableau representation is never used at all. 

From Eq. (3.4), it is clear that the initial state can be written as a graph 
with no edges and Hadamard gates acting on all vertices: 

|0)® N = |({1.N},{});H. H). 


3.3 Gates 

When the simulator is asked to simulate a Clifford gate, the current stabilizer 
state is changed and its graph representation has to be updated to correctly 
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reflect the action of the gate. How to do this, is the main technical result of 
this paper. 

Single-qubit gates 

In the graph representation, applying local (single-qubit) Clifford gates be¬ 
comes trivial: if C G C\ is applied to qubit a, we replace this qubit's VOP C a 
by CC a . 

Two-qubit gates 

It is sufficient if the simulator is capable to simulate a single multi-qubit gate: 
As the entire Clifford group is generated, e. g., by H, S, and AZ, all gates can 
be constructed by concatenating these. We chose to implement AZ, the phase 
gate, as this is (because of its role in Eq. (3.4)) most natural for the graph-state 
formalism. 

In the following discussion, the two qubits onto which the phase gate acts, 
are called the operand vertices and denoted with a and b. All other qubits are 

called non-operand vertices and denoted c,d, _ 

To solve the task, we have to distinguish several cases. 

Case 1. The VOPs of both operand vertices are in Z, where Z := {I, Z, S, S + } 
denotes the set of those four local Clifford operators that commute with AZ 
(the other 20 operators do not). In this case, applying the phase gate is simple: 
We use the fact that (due to Eq. (3.4)) applying a phase gate on a graph state 
just toggles an edge: 


AZ ab \(V,E)) = \(V,EA{{a,b}})), 

where A denotes the symmetric set difference A A B := (A U B)\(A fl B), i. e. 
the edge {a, b } is added to the graph if is was not present before, otherwise it 
is removed. 

Case 2. The VOP of at least one of the operand vertices is not in Z. In this case, 
just toggling the edge is not allowed because the AZ fl & cannot be moved past 
the non -Z VOP. But there is a way to change the VOPs without changing the 
state, which works in the following case: 

Sub-case 2.2. Both operand vertices have non-operand neighbors. Here, the 
following operation will help: 

Definition 3.4 The operation of local complementation about a vertex a of a graph 
G = (V,E), denoted L a , is the operation that inverts the subgraph induced by the 
neighborhood of v: 


L a (V,E) = (V,E A {{b,c}\b,c E ngbh a}) 
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This operation transforms the state into a local-Clifford equivalent one, as 
the following theorem, taken from [HEB04, NDM04b], asserts: 

Theorem 3.3 Applying the local complementation L a onto a graph G yields a state 
|L fl G) = U |G), with the multi-local unitary 

U = y/^iXa n 

faSngbha 


Note that the operator ViZ is related to the phase operator S of Eq. (3.2): 

ViZ = <0S + , and sfiX = 0X + = V 

An obvious consequence of Theorem 3.3 is the following. 

Corollary 3.1 A state |G; C) is invariant under application ofL a to G, followed by 
an updating of C according to 

{ CjjViX forb = a 

C b V~iZ forbengbha ■ (3.6) 

C b othenvise 

Now note that the local Clifford group is generated not only by S and H 
but also by \/—z'X and \fiZ, the Hermitean adjoints of the operators right- 
multiplied to the VOPs in Eq. (3.6). Our simulator has a look-up table that 
spells out every local Clifford operator as a product of -as it turns out, at most 
5- of these two operators, times a disregarded global phase. For example, the 
table's line for H reads: 

Hoc V^iXVIzVIzVIzV^iX. (3.7) 

This allows us now to reduce the VOP C n of any non-isolated vertex a to 
the identity I by proceeding as follows: The decomposition of C a taken from 
the look-up table is read from right to left. When a factor y — z'X is read we 
do a local complementation about a. This does not change the state if the 
correction of Eq. (3.6) is applied, which right-multiplies a factor ViX to C a . 
This factor ViX cancels with the factor \/—z'X at the right-hand end of C„'s 
decomposition, so that we now have a VOP with a shorter decomposition. 

If the right-most operator of the decomposition is ViZ we do a local com¬ 
plementation about an arbitrarily chosen neighbor of a, called a's "swapping 
partner". Now, the correction operation will lead to a factor S being right- 
multiplied to C a , again shortening the decomposition. 

Note that a local complementation about a never changes the edges inci¬ 
dent on a and hence, if a was non-isolated in the beginning of the procedure, 
it will stay so. This is important, as only a non-isolated vertex can have a 
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swapping partner. Hence, the procedure can be iterated, and (as the decom¬ 
positions have a maximum length of 5) after at most 5 iterations, we are left 
with the identity I as VOP. 

We apply the described "VOP reduction procedure" to both operand ver¬ 
tices. After that, both vertices are the identity, and we can proceed as in Case 
1 . 

One might wonder, however, whether the use of the VOP reduction pro¬ 
cedure on the second operand vertex b spoils the reduction of the VOP of 
the first operand a. After all, a could be a neighbor of b or of the swapping 
partner c of b. Then, if a local complementation L/, or L c is performed, the 
compensation according to Eq. (3.6) changes the neighborhood of b and c 
(which include a). But note that a neighbor of the inversion center only gets 
a factor \J—iZ cx S + . As S + generates Z, this means that after the reduction of 
b, the VOP of a might be no longer the identity but it is still an element of Z, 
and we are allowed to go on with Case 1. 

But what happens, if one of the vertices does not have a non-operand 
neighbor, that could serve as swapping partner? This is the next Sub-case. 

Sub-case 2.2. At least one of the operand vertices is isolated or only connected to 
the other operand vertex. We first assume that the other vertex is non-connected 
in the same sense: 

Sub-sub-case 2.2.1. Both operand vertices are either completely isolated, or 
only connected with each other. Then, we can ignore all other vertices and have 
to study only a finite, rather small number of possible states. 

Let us denote by • • the 2-vertex graph with no edges, and by • the 
2-vertex graph with one edge. There are only very few possible 2-qubit sta¬ 
bilizer states, namely those in 

Si := {|G;C 1 ,C 2 ) | G G {• Ci,C 2 G C x } . (3.8) 

Of course, many of the assignments in the r.h.s describe the same state, such 
that |<S 2 | < 2 • 24 2 . Remember that the phase gate AZi /2 (being a Clifford 
operator) maps Si bijectively onto itself. 

The function table of AZi / 2 1 5 , : |G;Ci,C 2 ) i—> \G';C\,C' 2 ) can easily be 
computed in advance (we did it with Mathematica) and hard-coded into the 
simulator as a look-up table. This table contains 2 • 24 2 lines such as 

I* # /C[i 3 ],C[ 2 ]) i * I* */G[q],C[ 2 ]) , (3.9) 

where the Cy (z = 0,.. .,23) are the Clifford operators in the enumeration 
detailed in [Sec. 2 . 2 ] (e. g. C[ 0 ] = I, C[ 2 ] = Y). 

Note that many of the assignments to Ci and C 2 in Eq. (3.8) describe the 
same state. Hence, we have a choice in the operators C\, C' 2 with which we 
represent the results of the phase gate in the look-up table. It turns out (by 
inspection of all the possibilities) that we can always choose the operators 
such that the following constraint is fulfilled: 
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Constraint 1. IfCi(C 2 ) G Z, choose C' v C' 2 such that again C\ {C' 2 ) G Z. 

The use of this will become clear soon. 

Sub-case 2.2.2. We are left with one last case, namely that one vertex, let 
it be a, is connected with non-operand neighbors, but the other vertex b is 
not, i. e. has either no neighbors or only a as neighbor. Then, we proceed 
as follows: We use iterated local complementations to reduce C a to I. After 
that, we may use the look-up table as in Sub-sub-case 2.2.1. That this is al¬ 
lowed even though a is connected to a non-operand vertex is shown in the 
following: First note that the state after the reduction of C a to I can be written 
(following Eq. (3.5)) as 


\(V,E);C) = n C c E[ AZ cd !+ + ■■■+) 

cev {c,d}eE 


= n c r n Az cd o,(Az fl 0!++•••+) 


v \{a,b} E\{{lb}} 

'- -- 


Q and AZ ab 
commute with this 


=i +r N -^w) ab 

with |<p)6S9 
(*) 


(where Z, = 0,1 indicates whether {a, b } G E). Observe that Q, has been 
moved past the operators A Z cd . This is allowed because none of the A Z cd 
acts on b 

We now apply AZ nh to this state. AZ nh can be moved through all the 
phase gates and vertex operators above the left brace so that it stands right 
in front of the S 2 state \cp) ab which is separated from the rest. Thus, the table 
(3.9) from Sub-sub-case 2.2.1 may be used. (This would not be the case if, in 
the state above the brace marked with "(*)", the two operand vertices were 
still entangled with other qubits.) The table look-up will give new operators 
C', C' h and a new so that the new state has the following form: 


AZ ab \ (V,E);C) = 


n c c 

V\fa,b} 


n 

{c,d}e 

E\{{n,b}} 


AZ cd C' a C' b ( AZ abf\+ + ---+) (3-10) 


For this to be a state in our usual |G;C) form (3.5), the two operators 
C' and C' b have to moved to the left, through the AZ cd . For C' b , this is no 
problem, as b was assumed to be either isolated or connected only to a, so 
that C b commutes with r\{c,d\eE { \a,b] } AZ ai , as the latter operator does not 
act on b. The vertex a, however, has connections to non-operand neighbors, 
so that some of the AZ cd act on it. We may move it only if C' G Z (as this 
means that it commutes with AZ). Fuckily, due to Constraint 1 imposed 
above, we can be sure that C' G Z, because C a = I G Z. 
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1 cphase (vertex a, vertex b ): 

2 ifngbhfl\{b} f {}: 

3 remove_V0P (a, b) 

4 end if 

5 ifngbh b\{a} f {}: 

6 remove_V0P (b, a) 

7 end if 

8 [It may happen that the condition in line 2 has not been fulfilled then, 
but is now due to the effect of line 5. So we check again:] 

9 ifngbha\{b} f {}: 

10 remove_V0P (a, b) 

n end if 

12 [Now we can be sure that the the condition (ngbhc\{a, b} = {} or 

VOP[c\ G Z) is fulfilled for c = a, b and we may use the lookup table (cf 
Eq. (3.9)).] 

13 if {a, b} G E : 

14 edge <— true 

15 else: 

16 edge <— false 

17 end if 

is (edge,V0P[a], V0P[i>]) <- cphase_table[edge,VOP[a],VOP[b]] 

Listing 3.1a: Pseudo-code for controlled phase gate (AZ) acting on vertices a 
and b (cphase) [here], and for the two auxiliary routines remove_V0P [Listing 
3.1b] and local_complementation [Listing 3.1c]. 


Listing 3.1 shows in pseudo-code how these results can be used to actually 
implement the controlled phase gate AZ. 


3.4 Measurements 


In a stabilizer circuit, the simulator may be asked at any point to simulate 
the measurement of a qubit in the computational basis. How the outcome 
of the measurement is determined, and how the graph representation has 
to be updated in order to then represent the post-measurement state will be 
explained in the following. 

To measure a qubit a of a state | G, C) in the computational basis means 
to measure the qubit in the underlying graph state |G) in one of the 3 Pauli 
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19 remove.VOP (vertex a, vertex b): 

20 [This reduces VOP[a] to I, avoiding (if possible) to use b as swapping 
partner.] 

21 [First, we choose a swapping partner c.[ 

22 ifngbhfl\{b} 7 ^ {}: 

23 c <— any element of ngbh a\{b\ 

24 else: 

25 C <— b 

26 end if 

27 d <— decomposition_lookup_table [a] 

28 [c contains now a decomposition such as Eq. (3.7)] 

29 for v from last factor of d to first factor of d 

30 if v = \/—iX: 

31 local_complementation (a) 

32 else: (this means that v = \fhZ) 

33 local_complementation (b) 

34 end if 

35 [Noiv, V0P[a] = I.] 

Listing 3.1b 


36 

37 

38 

39 

40 

41 

42 

43 

44 

45 

46 

47 

48 

49 

50 

51 


local.complementation (vertex a) 

[perforins the operation specified in Definition 3.4] 
n v <— ngbh v 
for i G n v : 
for / G n v : 
if i < j: 

if (; i,j) G E: 

remove edge ( i,j) 
else: 

add edge ( i,j ) 

end if 
end if 
end for 

V0P[z] VOP[z]v0Z 
V0P[z>] ^ VOP [v}\/iX 

end for 


Listing 3.1c 
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bases. Writing the measurement outcome as £, this means: 


I+(-l)£Z fl 

2 


G,C) 


{ n c t ) t+( 1 )?z x ig> 

\beV\{a} ) 

= ( n Cfl Jr+ (- 1 ) gCfl+ZflCfl |G) (3.11) 

\bev\{a} ) 


As C„ is a Clifford operator, P fl := C\Z a C a G {X fl , Y fl , Z a , —X a , —Y a , —Z a }. 
Thus, in order to measure qubit a of |G,C) in the computational basis, we 
measure the observable P a on |G). Note that in case that P„ is the negative 
of a Pauli operator, the measurement result £ to be reported by the simulator 

is the complement of the result given by the X, Y or Z measurement on the 
underlying graph state |G). 

How is the graph G changed and how do the vertex operators have to be 
modified if the measurement 00 |G) is carried out? This has been worked 
out in detail in Ref. [HEB04], which we now briefly review for the present 
purpose. 

The simplest case is that of P = ±Z. Here, the state changes as follows: 


I+(-l)ZZ a 

2 


1 ( 00 ) = 



n 0 

foengbhfl j 


z 

H n \(V,E\{{a,b}\b G ngbhfl})). 


(3.12) 


The value of i, is chosen at random (using a pseudo-random number gener¬ 
ator). To update the simulator state, the VOPs are right-multiplied with the 
under-braced operators (*) and the edges incident on a are deleted as indi¬ 
cated in the ket. 

A measurement of the Y observable (P = ±Y) requires a complementa¬ 
tion of the edges set according to 

E i—>■ E A {{b,c} | b,c G ngbhfl} 
and a change in the VOPs as follows: 

Cb i—► CbV—iZ ^ * for b G ngbha U {fl}, 
where the dagger in parentheses is to be read only for measurement result 

l = I- 
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The most complicated case is the X measurement which requires an up¬ 
date of edges and VOPs as follows: 


E i—> E A {{c,d} | c G ngbh b, d G ngbhfl} 
A {{c,d} | c,d G ngbh b fl ngbh a} 
A {{b,d} | d G ngbha\{b}} 


Q0 for c = a 

A7(+) 


C c 


C c ViY 

C c Z 

C c 


for c = b (read " + " only for £ = 1) 

ngbh a \ ngbh b\{b} 

(for £ = 0) 


for c G < 


(3.13) 


ngbh b \ ngbh a \ {fl} 
(for l = 1) 


otherwise 


Here, b is a vertex chosen arbitrarily from ngbh a and \/iY = 0, 

In all these cases the measurement result is chosen at random. Only in 
case of the measurement of P a = ±X an isolated vertex, the result is always 
£ = 0 (which means an actual result of £ = 0 for P a = X and £ = 1 for 
Pa = -X.) 



3.5 Implementation 

The algorithm described above has been implemented in C++ in object- 
oriented programming style. We have used the GNU Compiler Collection 
(GCC) [GCC] under Linux, but it should be easy to compile the program on 
other platforms as well. 2 The implementation is done as a library to allow 
for easy integration into other projects. We also offer bindings to Python 
[R + ], so that the library can be used by Python programs as well. (This was 
achieved using SWIG [B + ].) 

The simulator, called "GraphSim" can be downloaded from http: 
//homepage.uibk.ac.at/homepage/c705/c705213/work/graphsim.html. 

A detailed documentation of the library is supplied with it. To demon¬ 
strate the usage here at least briefly, we give Listing 2 as a simple toy example. 
It is written in Python, and a complete program. 

In the example, we start by loading the GraphSim library (Line 2) and 
then initialize a register of 8 qubits (line 4), which are then all in |0) state. 
We get an object called "gr" of class GraphRegister, which represents the 

2 We use only ISO Standard C++ with one exception: The hash_set template is used, which 
is, though not part of the standard, supplied by most modern compilers. 
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1 import random 

2 import graphsim 

3 

4 gr = graphsim.GraphRegister (8) 

5 

6 gr.hadamard (4) 

7 gr.hadamard (5) 

8 gr.hadamard (6) 

9 gr.cnot (6, 3) 

10 gr.cnot (6, 1) 

11 gr.cnot (6, 0) 

12 gr.cnot (5, 3) 

13 gr.cnot (5, 2) 

14 gr.cnot (5, 0) 
is gr.cnot (4, 3) 

16 gr.cnot (4, 2) 

17 gr.cnot (4, 1) 

18 

19 for i in xrange (7) : 

20 gr.cnot (i, 7) 

21 

22 print gr.measure (7) 

23 

24 gr.print_stabilizer () 

Listing 3.2: A simple example in Python 
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register of qubits. For all following operations, we use the methods of gr to 
access its functionality. In our example, we simply build up an encoded "0" 
state in the well-known 7-qubit Steane code, which we then measure. 

First, we apply Hadamard and CNOT gates onto the qubits with number 0 
through 6 in order to build up the Steane-encoded "0" (Lines 6-17). To check 
that we did so, we measure the encoded qubit, which is done by using CNOT 
gates to sum up their parity in the eighth qubit ("qubit 7") (Lines 19, 20). 
Measuring qubit 7 then gives "0", as it should (Line 22). 

For further details on using of the GraphSim library from a C++ or Python 
program, please see the documentation supplied with the source code. 

With approximately 1400 lines, GraphSim is complex enough that one 
cannot take for granted that it faithfully implements the described algorithm 
without bugs, and testing is necessary. Fortunately, this can be done very con¬ 
veniently by comparing with Aaronson and Gottesman's "CHP" simulator. 
As these two programs use quite different algorithms to do the same task, it 
is very unlikely that any bugs, which they might have, produce the same false 
results. Hence, if both programs give the same result, they can reasonably be 
considered both to be correct. 

We set up a script to do random gates and measurements on a set of qubits 
for millions of iterations. All operations were performed simultaneously with 
CHP and GraphSim. For measurements whose outcome was chosen at ran¬ 
dom by CHP, a facility of GraphSim was used that overrides the random 
choice of measurement outcomes and instead uses a supplied value. For 
measurements with determined outcome, however, it was checked whether 
both programs output the same result. Also, every 1000 steps, the stabilizer 
tableau of GraphSim's state was calculated from its graph representation and 
compared to CHP's tableau. 3 * 

After simulation 4 • 10 6 operations on 200 qubits in 18 hours and 2 • 10 8 
operations on 20 qubits in 19.7 hours without seeing discrepancies, we are 
confident that we have exhausted all special cases, so that the two programs 
can be assumed to always give the same output. As they are based on very 
different algorithm, this reasonably allows to conclude that they both operate 
correctly. 


3.6 Performance 

We now show that our simulator yields the promised performance, i. e. per¬ 
forms a simulation of M steps in time of order O(NdM), where N is the 

3 This was done with a Mathematica subroutine which tries to find a row adding and swap¬ 

ping arrangement to transform one tableau into the other. 
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number of qubits and d the maximum vertex degree that is encountered dur¬ 
ing the calculation. Let us go through the different possible simulation steps 
in order to assess their respective time requirements. 

Single-qubit gates are fastest: they only need one look-up in the multipli¬ 
cation table of the local Clifford group (which is hard-coded into the simula¬ 
tor), and are hence of time complexity 0(1). 

Measurements have a complexity depending on the basis in which they 
have to be carried out. For a Z measurement, we have to remove the deg a 
edges of the measured vertex a. As d is the maximum vertex degree that is to 
be expected within the studied problem, the complexity of a Z measurement 
is 0{d) < O(N) (as d < N). 

For a Y and X measurement, we have to do local complementation, which 
requires dealing with up to ^ edges, and hence, the overall complexity 
of measurements is 0(d 2 ). 

For the phase gate, the same holds. Here, we need a fixed number (up to 
5) of local complementations. Thus, measurements and two-qubit gates take 
0(d 2 ) time. 

This would be no improvement to Aaronson and Gottesman's algorithm, 
if we had d = 0(N). The latter is indeed the case if one applies randomly 
chosen operations as we did to demonstrate GraphSim's correctness. There, 
we indeed did not observe any superiority in run-time of GraphSim. 

In practice, however, this is quite different. For example, when simulating 
quantum error correction, one can reasonable assume d = Oflog N). This is 
because all QEC schemes avoid to do to many operations on one and the 
same qubit in a row, as this would spread errors. So, vertex degrees remain 
small. The same reasoning applies to entanglement purification schemes and, 
more generally, to all circuits which are designed to be robust against noise. 

The space complexity is dominated by the space needed to store the quan¬ 
tum state representation. As argued in Section 3.2, this requires only space of 
0(Nd), where d is the average vertex degree. As explained above, we may 
expect d (as d) to scale sub-linearly with N in typical application, in many 
applications as O(NlogN). This is what allows us to handle substantially 
more qubits than it is possible with the 0(N 2 ) tableau representation. 

As a first practical test, we used GraphSim to simulate entanglement pu¬ 
rification of cluster states with the protocol of Ref. [DAB03]. This has been a 
starting point of a detailed analysis of the communication costs of establish¬ 
ing multipartite entanglement states via noisy channels [KADB06]. Fig. 3.2 
demonstrates that GraphSim is indeed suitable for this purpose. Note, that 
for the right-most data points, the register holds 30,000 qubits. 

As we did a Monte Carlo simulation, we had to loop the calculation very 
often and still got an output within a few hours. For simulations involving 
several millions of qubits and a large number of runs, we waited about a 
week for the results when using eight processors in parallel. We redid some 
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Figure 3.2: Comparison of the performance of CHP and GraphSim. A simu¬ 
lation of entanglement purification was used as sample application. The reg¬ 
ister has 1000 times the size of the states to hold an ensemble of 1000 states. 

of these calculations in a more controlled testing environment as a benchmark 
for GraphSim. Fig. 3.3 shows the results in a log-log plot. 


3.7 Conclusion 

To summarize, we have used recent results on graph states to find a very 
space-efficient representation of stabilizer states, and determined, how this 
representation changes under the action of Clifford gates. This can be used to 
simulate stabilizer circuits more efficiently than previously possible. The gain 
is not only in simulation speed, but also in the number of manageable qubits. 
In the latter, at least two orders of magnitude are gained. We have presented 
an implementation of our simulation algorithm and will soon publish results 
about entanglement purification which makes use of our new technique. 
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width of register (number of qubits) 


Figure 3.3: Benchmark of GraphSim for very large registers. Entanglement 
purification -specifically: the purification of 10-qubit cluster states with the 
protocol of Ref. [DAB03]- was used as sample problem. The register was 
filled up with cluster states to make a large ensemble, and two protocol steps 
were simulated. The average time per operation was obtained from the total 
run-time. [Giving the time per operation in seconds is of use only when one 
specifies the machine which has run the code: We used Linux computers with 
AMD Opteron processors, clocked with 2.2 GHz. Only one the machine's 
several processors was dedicated to our computation task. The code was 
compiled using the GNU C++ compiler (version 3.2.3) with 64-bit target and 
"03" optimization.] 
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Chapter 4 

Upper bounds to the 
fault-tolerance threshold 


As already stated in Chapter 3, a Clifford network simulator such as Graph- 
Sim promises to be a useful tool in investigation fault-tolerance of quantum 
computation. In this chapter, I wish to outline a suggestion how this might 
be done. After reviewing certain relevant aspects of the theories of quantum 
error correction and fault-tolerant quantum computation (Secs. 4.1 and 4.2) 
we study how quantum error correction was simulated so far (Sec. 4.3) and 
then suggest a way how a stabilizer simulator might be used to improve on 
that (Sec. 4.4). 


4.1 Brief overview on quantum error correction 

When quantum computing was first proposed, it was met with substantial 
scepticism due to the problem of decoherence: on the one hand, a quantum 
register has to be shielded against uncontrolled interactions with the envi¬ 
ronment which lead to decoherence, on the other hand, one needs to interact 
with the quantum register in order to carry out the operations and measure¬ 
ments that constitute the computation. It was believed that these two require¬ 
ments are impossible to reconcile as was pointed out especially by Landauer 
and by Unruh [Unr95]. On a classical computer, the problem is solved by 
storing the information in a redundant way. 1 Due to the no-cloning theorem 
[WZ82], one cannot make a "back-up copy" of a qubit. One can, however, 
distribute the information of a single ("logical") qubit over several physical 
qubits such that, when a noise event of limited extent disturbs some of these 

1 Actually, due to the robustness of contemporary electronics, the elaborate theory of clas¬ 
sical error correction by means of codes with redundancy is not even really needed for the 
volatile memory of today's computers. Nevertheless, a tremendous redundancy is achieved 
by the simple fact that in a DRAM or an SRAM, a charge of not a single electron, but of a 
mesoscopic amount of electrons represents every single bit. 
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qubits, one can identify the action of the noise and undo it without measuring 
or disturbing the information of the logical qubit, which hence stays intact. 
This was realized by Shor [Sho95], who proposed what is now known as the 
Shor 9-qubit code. This code, however, allows only for prolonged storage of 
quantum information, but not for manipulating the information in order to 
do quantum computation without first uncoding it. 

What was needed is a code which allows to perform quantum gates in 
an "encoded" fashion, i. e. the gates should act on the encoded qubits and 
produce their result in encoded form, such that the information is never 
"bare" and hence exposed to full decoherence. Such codes, together with 
schemes for the encoded gates, have been found more or less simultane¬ 
ously by Calderbank and Shor [CS96] and by Steane [Ste96b] and are hence 
known as CSS codes. A better understanding of these and an extension to 
more codes was reached soon by the introduction of the stabilizer formalism 
[Got96, Got97], A way to understand such stabilizer codes in terms of graphs 
(in a manner very similar to the graph states mentioned in the previous chap¬ 
ters) has also been found [GKR02, SchOl, SW02] but will not be used here. 

Assuming the reader's familiarity with the subject (A good overview 
is provided in Nielsen and Chuang's textbook [NCOO], and in Preskill's 
overview articles [Pre98c, Pre98a]), we only briefly review some key features 
which are relevant for this chapter's discussion. 

The most remarkable property of CSS codes is the fact that Clifford gates 
are "transversal" (or can at least be decomposed into transversal gates). This 
means that in order to perform an encoded Clifford gate on encoded (logical) 
qubits, one simply acts with bare versions of the gate on of the bare (physi¬ 
cal) qubits. To perform, for instance, a CNOT gate on two logical qubits, each 
encoded with 7 physical qubits according to the [7,3,1] CSS code, one per¬ 
forms a CNOT on each of the 7 pairs of corresponding physical qubits. After 
each gate, the error syndrome is measured, i.e., it is checked whether an error 
has occurred, and if so, which one. The syndrome can indicate for each of the 
bare qubits whether a bit flip, a phase flip, a bit-and-phase flip or no error has 
happened, provided not more errors have happened than the code is able to 
correct. 2 

As the Clifford gates alone do not allow for universal quantum compu¬ 
tation, we need at least one additional gate to have a universal gate set and 
this gate cannot be expected to allow for transversal encoded implementa¬ 
tion [ZCC07], Finding a non-transversal way of implementing such a gate's 
action on the encoded qubits is the main challenge. The first works gave ex¬ 
plicit construction for the chosen code either for the Toffoli gate [Sho96] or for 
the preparation of a specific encoded state, which in turn allows to perform a 
Toffoli gate [KLZ96, KLZ98b], Later, Gottesman and Chuang introduced the 

2 Recall that an [n, k, d J code codes k logical qubits with n physical qubits and can detect 
up to d errors but only indicate the proper correction operation if at most (d — l)/2 qubits are 
affected by errors. 
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concept of gate teleportation which gives rise to a general scheme to imple¬ 
ment fault-tolerant non-transversal gates with all CSS codes [GC99]. 

4.2 The fault-tolerance threshold 

In the newer literature, a distinction is often made between the terms quantum 
error correction and fault-tolerant quantum computation. Subject of the former is 
the study of quantum codes and their error correcting capabilities, subject of 
the latter is the question whether this really keeps the quantum computation 
safe from decoherence and which additional steps have to be taken to ensure 
this. The goal is to construct a quantum computer capable of performing 
computations of arbitrary length. Just using a code as described above is not 
sufficient here. 

Code concatenation [KL96] is a solution. 3 Leaving out certain non- 
negligible subtleties for a first simplified explanation, it works as follows: 
Given a bare error rate £o, which is the probability that an error occurs 
within one computational step (i. e., one gate), the probability per step that 
decoherence corrupts the state of the quantum register is reduced from eo 
in the case of computation without encoding to ce q (where c is a constant) 
when employing a quantum code that fails only when at least two errors 
occur in the same logical qubit. If one replaces each of the bare qubits again 
by an encoded logical qubit, that is, one concatenates the code with itself, 
the error rate drops to c(c£q) 2 , and for A concatenation levels, one gets 

£a = (ceo)^ 2 ’ Ic. It is easy to see that for A —> oo this double exponential 
converges very quickly to either 0 (the computation virtually never crashes 4 ) 
or to 1 (it virtually always crashes), depending on whether the bare error 
rate £o is below or above a threshold £ t h- This remarkable fact is known 
as the threshold theorem and proofs of varying rigor and varying generality 
with respect to assumptions on the noise models have been given by various 
groups in 1997 [KLZ98a, KLZ98b, AB97, Kit97, Got98b], (For a self-consistent 
exposition see e. g. Aharonov's PhD thesis [Aha99].) A key point is that 
the resource requirements only seem to scale exponential: At coding level 
A, one needs k A physical qubits per logical qubits if the code needs k 
bare qubits per encoded qubit. However, as the error rate £a drops with 
doubly-exponential speed, one needs only a very small coding level A for 
very long computations. 5 It is easily shown that the number of physical 

3 Another possibility to allow for computations of arbitrary length is to use a code whose 
structure is invariant under the number of physical qubits used to encode a logical one and 
that hence can be blown up to arbitrary size. This is the feature that makes topological codes 
[Kit03] useful. 

4 In the parlance of fault-tolerance, an error which is not corrected and hence causes the 
computation to finish with a wrong result is termed a "crash". 

5 If the bare error rate is well (say, one order of magnitude) below the threshold, two or three 
levels may be sufficient. [J. Taylor, pers. comm., see also Steane's simulations mentioned later] 
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qubits and the number of gates is blown up only poly-logarithmically. 

The brief exposition just given has been substantially oversimplified. If 
one wants to get a useful estimate for the value of the fault-tolerance thresh¬ 
old, a multitude of factors must be taken into account. Let us list a few of 
these complications: 

• There is not just one physical error rate £o, but several. At the very 
minimum, one should distinguish: (i) the probability that performing 
a physical gate causes an error (gate errors), (ii) the probability that a 
qubit not affected by a gate suffers an error during the time step corre¬ 
sponding to a gate performed somewhere else (storage error), and (iii) 
the probability that the result of the measurement of a physical qubit 
is invalid (measurement errors). Furthermore, for a given physical im¬ 
plementation of the quantum computer, the gate error rates can be very 
different depending on the arity of the gate (i. e., the number of qubits it 
acts on) or even on the type of gate. (In most implementations, single¬ 
qubit rotation around some axes are easier to perform than others.) 

• Some architectures have only limited capability to perform several 
gates simultaneously. (In the Cirac-Zoller ion trap proposal [CZ95], 
for example, only one gate per time step is possible. 6 ) This makes it 
difficult to establish the temporal order of all the physical gates and 
from this estimate the typical time that a qubit has to wait for syndrome 
measurement and error recovery. 

• With several concatenation levels, and due to the high parallelity re¬ 
quired to keep storage errors low, the task of controlling the quan¬ 
tum computer becomes a very demanding task, and the capabilities of 
the (classical) control electronics may become a limiting factor. For in¬ 
stance, in semiconductor implementations, which have an especially 
high clock frequency (i. e., very fast gates and short decoherence times), 
it seems necessary to intersperse localized classical microprocessing 
units between groups of qubits to control operations on the spot, as al¬ 
ready just the runtime of control signals makes control by a central clas¬ 
sical controller impossible [J. Taylor, pers. comm., see also [TED+05]]. 

• In many architectures (e. g., in ion traps), a measurement takes several 
orders of magnitudes more time than a gate. This makes timing even 
more difficult. 

• While two-qubit gates on low coding levels may be simulated straight¬ 
forward, gates on high concatenation levels usually involve qubits far 
apart. In virtually all proposals for quantum computers, the time to per¬ 
form such a gate depends on the distance. (Even the former standard 
example for an architecture with distance-independent two-qubit gates, 

6 On the other hand, a single ion trap is insufficient anyway, as the number of ions per 
trap is limited, and a "shuttling" scheme [KMW02] is needed. Then, each ion trap can work 
autonomously, but gates spanning two traps (which occur in high-level corrections) require 
expensive shuttling. 
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the Cirac-Zoller ion trap, suffers from this do to the need of shuttling 
(see footnote at item above).) 

In order to get a value for the fault-tolerant threshold one has to fix the 
choice of architecture, setup, quantum code, recovery and parallelization 
schemes, a noise model and further assumptions. Then, one may follow two 
ways: either an analytic or semi-analytic calculation, or a simulation. 

One sets out to do a rigorous calculation by making pessimistic choices 
for all assumption in order to get a defensible result. Everything that can¬ 
not be calculated exactly by analytical or numerical means is simplified in a 
way that is sure to only lower the value of the result. This will yield a lower 
bound on the threshold. The problem is that the threshold may be several or¬ 
ders of magnitude away from the real value. After fault-tolerance schemes 
were found in 1997, most researchers have first done calculations that were 
conservative in the described sense and should hence be considered as lower 
bounds. Knill et al. gave a threshold of 3 • 10~ 6 for the [7,1,3]] code, assuming 
independent errors and a very general architecture [KLZ98a] (elaborated in 
[KLZ98b]), and Kitaev [Kit97] and Aharonov and Ben-Or [AB97, AB99] also 
get results around 100 

4.3 Threshold estimation by simulation 

The value of 10 6 sounded discouraging, and it was quickly realized that it is 
overly pessimistic due to the fact that these calculations assume a syndrome 
measurement on every logical qubit on every coding level after every step. 
Especially, every logical qubit is treated in the same way no matter whether 
a computational gate has acted on it or not. In the latter case, however, only 
the storage error probability applies, which is likely to be much smaller than 
the gate error probability. The cumulative effect of the back-action of the 
many gates involved in syndrome measurement may then do more harm 
than good on these qubits, and one fares better by measuring the syndrome 
of "waiting" qubits less frequently. Furthermore, doing error correction al¬ 
ways on all levels might be counter-productive as well, as only the lower 
levels accumulate errors fast. Finally, semi-analytic calculations usually as¬ 
sume concatenation ad infinitum, which is neither realistic nor gives any hint 
on the actual resource scaling for a bare error rate well below the threshold, 
which is an information as important as the threshold itself in order to chose 
a fault-tolerance scheme (a point stressed especially in [Ste03]). 

Hence, in order to get realistic values, it is important to take the actual 
mode of operation (including all the point mentioned above and probably 
even more) into account. This renders any explicit calculation of the thresh¬ 
old, analytical or numerical, hardly feasible. 7 One has to resort to simula- 

7 Steane managed to get analytical results [Ste98a], which did take into account some more 
constraints but still resulted in a lower bond rather than an estimate. 
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tion. A first attempt to do so resulted in an estimate of 10 4 for the threshold 
[Zal96], obtained, however, with rather optimistic assumption such as negli¬ 
gence of storage errors and very simple treatment of two-qubit gates. A more 
realistic simulation proved to be quite difficult, but was finally achieved in 
an impressive work by Steane [Ste03], where he obtains numerical values 
for the threshold for the gate error as function of the storage error rate and 
the time required for measurements for the CSS codes [7,1,3]] ("Hamming") 
and [23,1,7]] ("Golay"). He reaches the order of 10 3 for the threshold, i. e., 
a higher value than Zalka's earlier result despite his more realistic assump¬ 
tions (however still neglecting distance dependency of gates), primarily due 
to improved syndrome extraction procedures. This last point shows the other 
value of simulations: it allows to "experiment", to try out and evaluate im¬ 
provements to schemes. Steane cites as his main improvement the use of the 
Golay code instead of the Hamming code and an improved ancilla verifica¬ 
tion scheme. 

Since Steane's work, only a few other detailed numerical simulations for 
threshold estimation have been undertaken. Salas and Sanz studied the effect 
of ancilla preparation quality and of parallelisation schemes [SS02, SS04], us¬ 
ing a technique similar to Steane's. Svore et al. simplified Steane's scheme to 
an only semi-numerical approach and studied how the threshold is modified 
due to the increasing distance between operand qubits when encoding blows 
up the quantum register [STD05]. The problem of ancilla verification seems 
crucial and the optimal solution may not yet have been found, as improve¬ 
ments are still being suggested (e.g., very recently: [DA07]). 

Possibly a major step forward was Knill's work [Kni05] where he com¬ 
bines several recent ideas, most importantly ancilla factories for gate tele¬ 
portation [ZLCOO] and post-selection to a very sophisticated scheme for fault 
tolerance, which allows, according to his numerical simulations, for a fault 
threshold of the order of 10 2 . (A recent —calculating, not simulating— 
study [AGP07] for this scenario again has a sizable gap between the lower 
bound it proves, namely 1.04 • 10 4 , and Knill's estimate just stated.) While 
Knill's calculation assumes distance-independent two-qubit gates, the work 
by Raufiendorf and Harrington claims to reach the same order of magnitude 
for a local architecture using very different techniques, most notably topolog¬ 
ical codes [RH07], 

4.3.1 Error tracking 

According to what we know, it seems highly unlikely that a quantum com¬ 
puter can be simulated faithfully on a classical computer (at least not without 
waiting exponentially long for a result). So, how can we nevertheless study 
fault tolerance by simulation? The technique employed by Steane [Ste03] is 
to track not the computational state of the computer but only the error state. 
This is roughly done as follows: We set out to simulate a sequence of logical 
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(i. e., computational) gates in a given level of encoding. The gate on top level 
is broken down into gates of the lower level, either transversally or according 
to the construction for the encoded non-Clifford gate. This is tracked down 
to the level of physical gates acting on physical qubits. According to the cho¬ 
sen scheme of fault tolerance, at any level all or some of the gates have to be 
followed by a syndrome measurement. The syndrome measurements require 
fresh ancillae, also prepared to the respective encoding level. The operations 
needed for this are broken down to the level of physical gates as well. When¬ 
ever a physical gate is now simulated, its action on the physical qubits is not 
actually evaluated, as we cannot track the actual quantum state of the sim¬ 
ulated quantum computer. The only thing done is to simulate an error in a 
Monte-Carlo manner: with the given physical gate error probability, an er¬ 
ror happens, and if so, one of the possible Pauli errors is chosen at random, 
and the physical qubit (or rather its representation in the simulating classical 
computer) is marked with the chosen Pauli operator as being erroneous. It is 
these error marks, and not the actual state, that the simulation keeps track of. 
After each gate, all the qubits not effected by the gate may also be marked as 
erroneous, but this with the lower storage error probability. When a multi¬ 
qubit gate acts on an erroneous gate, the error is propagated, i. e., the other 
operant qubit is marked as erroneous as well. According to the gate and its 
action on Pauli operators under conjugation, the type of error may change in 
the process. 

Whenever measurements of physical qubits in the context of syndrome 
extraction operations are simulated, the actual result can be found despite 
the fact that the state of the qubits is unknown. This is because for the usual 
schemes, the syndrome is 0 in case of no errors, and hence, the measurement 
result is assumed to be 0 in case of no error or an irrelevant error (for instance, 
an X measurement on a qubit marked with a Z error still gives 0) and 1 for 
a relevant error. (If one wishes to account for measurement errors, too, one 
may flip the result with a given probability.) From the syndrome, the appro¬ 
priate recovery operation is determined and applied. If, for example, it was 
determined that a certain qubit needs to be phase-flipped (i. e., a Z operation 
has to be applied) and the qubit was in fact marked as Z erroneous, this error 
mark is cleared. If it was not marked as erroneous, the recovery introduces 
an error: The qubit is now marked Z-erroneous. 

In this fashion, one can keep track of the appearance, propagation and 
clearing not only of bare errors but also of errors on the encoded levels. On 
the low levels, faulty or insufficient recovery operations may be common, but 
on the highest level, they must always lead to an error-free logical qubit. If 
an error is propagated to the highest level and not corrected, the simulated 
quantum computer is deemed "crashed". By performing many simulations 
and keeping track of the frequency of crashes, one obtains the crash probabil¬ 
ity, which (for a sufficiently long simulated computation) may be expected to 
be either close to 0 or close to 1, depending on the bare error probability. The 
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a bit flip (NOT gate, X) and a phase flip (Z) 




a CNOT (AX) and a controlled phase flip (AZ) 

The backslash indicates that the line represents a bundle of 7 qubits. 
Every gate is to be thought as divided into 7 copies, each acting on 
one qubit. If a gate is not divided, but acts jointly on all qubits as 
7-qubit gate, this is indicted by a curly brace 

a measurement and a classically conditioned gate (it is only carried 
out if the measurement outcome was 1) 


Table 4.1: Legend to the symbols used in the quantum circuits shown in this 
chapter. (All circuits drawn with the FTgX package Q-circuit [FE04].) 


threshold is then obtained as the value of the bare error probability, for which 
the the simulation result changes, presumably rather abruptly, from close to 
0 to close to 1. 

4.4 Threshold estimation with Clifford simulations 

4.4.1 Rationale 

Numerical calculations, especially when based on involved algorithms, are 
much harder to check independently than analytical reasoning. For example, 
the results of the simulations cited so far have been met with scepticism in 
the community due to a perceived lack of opportunity for double-checking. 

In this chapter I would like to propose a way to simulate fault-tolerant 
quantum computing in a fashion different from the error-tracking scheme just 
described. While the proposal might be not as fast as error tracking, it may 
offer useful advantages in reliability, as it allows to check certain reasoning 
that so far has been hard-coded into the simulation and was hence not subject 
of the simulation's scrutiny. Also, the same feature allows greater flexibility 
in trying out different fault-tolerance schemes without any need to change 
the simulation code. 

For simplicity, I shall use for this exposition the original schemes for 
the [7,1,3]] CSS code described in [Ste96b, CS96] in the form explained in 
Preskill's reviews [Pre98a, Pre98c], even though more efficient schemes are 
known by now. Fig. 4.1 shows the network for syndrome correction and re¬ 
covery from error that should be applied to any encoded qubit after it has 
been exposed to noise, ie., after a computational gate has been performed 
involving this qubit, or after the qubit has been untouched for a while. To 
read the figure, consult the legend in Table 4.1. The circuit works as follows: 
A logical 0 is encoded on a set of 7 bare ancillary qubits, which are coupled 
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Figure 4.1: Steane's recovery scheme for CSS codes. ("Vero" means an ancil¬ 
lary 0, freshly prepared and verfied with the circuit of Fig. 4.2.) 


by a CNOT gate, followed by a Hadamard gate. (Recall that these are both 
transversal gates.) According to the construction of the [[7,1,3]] code, each 
of the 7 bare qubits of the ancilla should be in the |0) state afterwards. If 
one of the 7 parallel Z measurements yields a 1, this indicates that the corre¬ 
sponding bare qubit in the encoded data qubit has suffered a bit flip (X) error, 
and as remedy, an X operation is performed on it. Unfortunately, the error 
might have been introduced into the ancilla by the CNOT or the Hadamard 
gate, so that this correction actually introduces an error instead of correcting 
one. This source of additional errors is suppressed quadratically by requir¬ 
ing the 1 to be measured twice on two independently prepared ancilla |0) 
states. In the same way, two encoded |0) ancillae are prepared, first subjected 
to a Hadamard and then coupled to the data with a CNOT (now in opposite 
direction) to detect and correct for phase flip (Z) errors. 

Even with this double checking, we still cannot expect that the correction 
extracts more noise than it adds. This is because each of the ancillae may con¬ 
tain errors which are propagated onto the data qubit due to the back-action of 
the CNOT gates. This possibility has to be suppressed quadratically as well 
by verifying twice that the ancilla is really in the |0) state before allowing it to 
interact with the data. Fig. 4.2 shows a standard version of the preparation 
and verification network (still following [Pre98a, Pre98c]). 

Can we now be sure that this scheme removes errors which appear with a 
rate £o but introduces new errors only with a rate of O(eq), as needed to make 
code concatenation work? Yes: the networks can be rigorously checked by 
paper-and-pencil calculation, as explained in the cited references. But what 
is the exact probability that a qubit that initially was erroneous with proba¬ 
bility £o is erroneous after the circuit? How does this depend precisely on the, 
possibly different, error rates for the various gates? How on the preparation 
and the measurement noise? As argued earlier, the order in which the oper¬ 
ations are performed, may make a huge difference unless we neglect storage 
errors. Getting a closed formula for this might be still possible, but is surely 
very laborious and extremely error-prone. If we want to include interactions 
between different coding levels, things get only more difficult. So we better 
resort to a numerical simulation: After every time step, we draw for each qubit 
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Figure 4.2: (a) Preparation of an encoded |0). (b) Verification of the encoded 
ancilla (using up two further encoded zeros) for use as ancilla for Steane re¬ 
covery. 


a number from a pseudo-random number generator (with range [0; 1]), and if 
the number is below the error rate (which depends on whether the qubit was 
idle or participated in a gate, and in the latter case also in the kind of gate), 
one of the "error operators" X, Y and Z is applied to it. 8 

This is the way used by Steane to simulate noise. As described above, in 
Steane's scheme, the path of the error is then "tracked". For this tracking, 
one needs to supplement expectations about where the errors become mani¬ 
fest (measured) and where they get corrected. This information comes from 
an understanding of the functioning of the network, namely when which 
conditioned correction operations happen and in which cases they actually 
succeed in correcting the error. In other word, the reasoning, why and how 
the correction network operates, has to be put into the simulation by the pro¬ 
grammer. The program does not notice by itself that a certain measurement 
is a syndrome measurement, or a certain operation clears an error. 

It would be desirable if the correcting behaviour emerged from the sim¬ 
ulation, i. e., if the error correction happens "automatically" simply because 
an error-correction network is simulated. This should be most useful if one 


8 After two-qubit gates, we may chose from one of the 15 possibilities in (1, X, Y, Z} ® 
{1, X, Y,Z}\U® 2 . This adds the possibility of multi-qubits gates introducing correlated errors 
in a most straight-forward way. 
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uses complicated schemes or wishes to study effects of code subtleties 9 , and 
would free the programmer from the very difficult and error-prone task of 
putting in the recovery logic "manually". Furthermore, an automatic treat¬ 
ment of recovery logic may save considerable time for a researcher who 
wishes to compare the performance of different codes and recovery circuits 
under varying constraints and parameters for noise, geometry and other fac¬ 
tors, or who even seeks to find new, or improved recovery networks. On first 
look, such an automatism seems not possible because it amounts to simu¬ 
lating the actual action of the gates involved in the error correction. Yet, as 
these gates are all Clifford gates, a simulation with the help of a stabilizer 
simulator should be possible. 10 The main problem is that while the gates of 
the syndrome measurement and the error correction parts of the network are 
all Clifford gates, the computational gates typically are not and hence drive 
the simulated computer out of the space of stabilizer states. In the following 
section we look at the simple case of a computational network only consisting 
of Clifford gates, and then, we shall study a simple scheme to deal with non- 
Clifford gates and argue that it may be expected to give reasonably accurate 
results despite its simplicity. 

4.4.2 Simulation of pure Clifford networks 

As long as the computational network (i. e., the network of gates acting onto 
the actual data as represented by the top coding level) consists only of Clif¬ 
ford gates, the whole physical network will also only consider of physical 
Clifford gates and can be fully simulated on a classical computer. This is be¬ 
cause, at least for CSS codes, the circuits for encoding, syndrome extraction, 
and recovery can always be build using only Clifford gates, and because the 
computational Clifford gates can be broken down into transversally applied 
physical Clifford gates. The full physical circuit is then simulated, with errors 
being put in at random with specified probabilities in order to do a Monte- 
Carlo style simulation. At the very end of each run, the result is measured, 
decoded and compared to the expected result. 11 

Fig. 4.3 shows such the results for such a simulation for a quite sim¬ 
ple computational network: A logical qubit is initialized as |0) (according 

9 To give a well-known example for such a subtlety: The [7,1,3] CSS code is usually said to 
be able to correct but one error. Yet, if two different bare qubits are affected by different errors 
—one qubit by an X error, the other one by a Z error— the network of Fig. 4.1 can recover the 
state. 

10 I may be not the first to suggest such an approach. Knill may have used similar techniques 
to obtain the results presented in [Kni05], as seems to be implied in the supplemental material 
to the article. One should, however, expect that the scaling of simulation techniques following 
Aaronson and Gottesman [AG04] may have been an obstacle. This limitation was, after all, 
the motivation to develop GraphSim. 

11 Note that in case of a complicated computational circuit, the Clifford simulator (without 
encoding, and with noise set to 0) can also be used to find out what result to expect. 
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8x FT Hadamard 



Figure 4.3: Simulation of fault-tolerant performance of 8 consecutive 
Hadamard gates at different bare error rate and coding levels. The curves 
seem to cross roughly at a bare error probability a bit less than 3 • 10 4 . (For 
a doubly-concatenated code (green curve) the calculation gets expensive and 
hence, I have made considerably less Monte Carlo runs for this first try, re¬ 
sulting in a much higher statistical inaccuracy. This may have caused the 
outlier to the left-hand side — note that the error bars denote only one stan¬ 
dard deviation and an error larger than indicated is likely.) 
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to Fig. 4.2a), and then, 8 consecutive encoded transversal Hadamard gates 
are applied, each gate followed by a full recovery network according to Fig. 
4.1. Afterwards, a logical Z measurement is simulated. If the result (after 
post-measurement error correction) is not 0, the computation is considered 
as crashed. The plot shows this fault probability for different values of the 
bare error probability eq. After each physical gate (including the gates within 
the recovery circuits, of course), an error event happens with probability eq. 
In case of a single-qubit gate, an extra X, Y or Z operation is then applied, 
each with probability 1/3. For a two-qubit gate, one of the 15 possible errors 
in {1, X, Y, Z}® 2 \1° 2 is chosen uniformly at random. Each measurement re¬ 
ports, with the same probability, the wrong result. Storage errors have been 
set to 0 for this simple example. 

The black curve shows the fault probability for an non-encoded circuit, 
i. e., for simply 8 consecutive bare Hadamard gates applied to a qubit initially 
in the |0) state, followed by a single Z measurements. Disregarding the possi¬ 
bility of two errors canceling each other, we get a fault probability of roughly 
Eg 0 , as there are 10 operations. In fact, the doubly logarithmic plot shows a 
linear curve, indicating a simple power-law dependence. The red curve is 
done for a single code layer (i.e., 7 bare qubits, no code concatenation). We 
see that for a bare error probability £o below approx. 3 • 10© the coding and 
error correction succeeds in lowering the fault probability, while above this 
threshold, the error recovery network introduces more noise than it corrects 
for. The green curve is for a two-layer code (i.e., 7 2 = 49 qubits, code con¬ 
catenated once). The green curve should be expected to cross the other two 
curves roughly at the same point. 12 Within the specified 1 a error bars, this 
seems to be roughly the case. So far, the calculation seems to reproduce the 
initially quoted rough results of an fault tolerance threshold slightly above 
10 4 for a straight-forward implementation of the concatenated [7,1,3]], ne¬ 
glecting storage errors with syndrome measurements on all levels after every 
step. 

4.4.3 Generation of encoded circuits 

Even the simple eight-Hadamards network described above blows up to con¬ 
siderable size when encoded at two levels. Together with the gates of the 
recovery networks on both coding levels (and with the lower-level recovery 
circuits for each gate in the higher level recovery circuits), we get a bit more 
then 10 ’ physical gates. Obviously, such a network is too cumbersome to 
implement manually; one should generate it by a computer program. This 
can be done in an elegant fashion by modeling code concatenation and er¬ 
ror correction with object-oriented design. (A reader not familiar with the 
terminology of object-oriented programming may want to skip to the next 

12 Only roughly, not exactly, due to subtle "interactions" (such as error cancellation) between 
code layers. 


© 


© 


© 


© 











0 


0 


0 


0 


70 


4. Upper bounds to the fault-tolerance threshold 


subsection.) An abstract class qubit is used to represent a qubit and has two 
concrete subclasses, phys.qubit for a physical (bare) qubit and enc qubit for 
an encoded qubit. An enc.qubit has an array field sub.qubits with refer¬ 
ences to the lower-level qubits it is comprised of, while a phys_qubit sim¬ 
ply knows its index (i. e., position) within the quantum register simulated by 
GraphSim. For all commonly used transversal Clifford gates, qubit provides 
an abstract method. In phys.qubit's implementation of this method, the cor¬ 
responding method of GraphSim's GraphRegister is called, and then, a ran¬ 
dom choice is made with the given bare error probability to decide, whether 
an error event happens. In the error case, the chosen error event is realized 
by calling GraphSim's methods for Pauli operations on the affected qubits. 
In enc .qubit's implementations for the transversal Clifford operations, the 
same method is called for the sub .qubits, which hands the gate recursively 
down the code layers. After calling the sub .qubits' methods, the method 
calls the recovery method, which executes the full recovery circuit of Fig. 4.1 
by creating ancillae, performing all the gates and measurements (by calling 
the relevant methods of the ancilla enc.qubits on the same coding level) and 
indicated recovery operations. Measurements are implemented in the same 
fashion. 

As the recovery method creates ancillae and then disposes of them, it is 
useful to implement a simple resource management for the quantum regis¬ 
ter. This is most conveniently done in the style of a primitive heap: When an 
encoded |0) ancilla is needed, the necessary physical qubits are "allocated", 
and once the ancilla has been measured, they are "freed". The heap man¬ 
ager just keeps track of which physical qubits are not in use ("free"). For 
an allocation, a free qubit is chosen arbitrarily, marked as "not free", set to 
|0) and provided. When it is freed, it is simply marked "free" again. Note 
that this simple scheme is viable only if one simulates a quantum computer 
whose performance at multi-qubit gates does not depend on the distance of 
the operand qubits, i. e., there are no geometry considerations. However, 
all proposed implementation do have geometry dependences, and the fault- 
tolerance threshold of a given scheme can depend strongly on the strategy 
to allocate ancilla qubits. (See e. g. [Ste02] for a discussion on geometry opti¬ 
mization.) Hence, the simple scheme just sketched may be replaced by vari¬ 
ous complex schemes, which allows to compare their performance. Such sim¬ 
ulations may turn out to be of high importance, once one considers detailed 
quantum computer designs, and the possibility that they can be studied with 
a Clifford simulator in the proposed manner might then turn out to be most 
valuable. 

4.4.4 Notes on Monte Carlo statistics 

So far, we have considered the naive strategy of drawing after each physical 
gate a (pseudo-)random number from [0; 1], and if it is below Eq, simulate 
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a Pauli error. However, as can be seen from Fig. 4.3, the values of the bare 
error probability £o for that the simulation needs to be performed, can be 
quite low. Hence, even for a larger number L of gates (for the level-2 curve 
in the 8-Hadamard example, L > 1000), the probability that all these gates 
perform without any bare error, namely (1 — £o) L , is quite high. For example, 
for £o = 10 4 and L = 10 4 , 90% of all simulation runs will not contain a 
single error event, and hence are identical. This shows that this approach is 
very inefficient. 

A better solution uses the binomial distribution and works as follows: 
Let us assume that the computation contains at most L gates. (Note that 
the number of gates depends on the occurring errors, as these stimulate the 
insertion of gates for error recovery according to the measured syndrome. 
Hence, we should choose L so large that the probability of the actual number 
of gates exceeding L is so small that these cases may safely be neglected.) 
Then, the probability of k errors happening within these L possible "error 
locations" is given by 


PW = (0o (1 - eo) 1 -*. 

Not all of the k errors actually occur within a computation as some may be 
beyond the actual number of gates. We now let k take a sequence of values, 
keeping the value fixed for a number of Monte Carlo runs. For every run, we 
choose, uniformly at random, k error locations from the set {1,2,..., L} of 
error locations. Within the run, each gate increases a global counter and then 
reads of the counter to see, which position it (the gate) has within the sim¬ 
ulation sequence of all physical gates. Instead of drawing a random num¬ 
ber from [0; 1], the gate looks up whether its position (as read off from the 
counter) is one of the k error locations drawn at the start of the run, and if 
so, simulates a Pauli error. Let us say that of the n(k) Monte Carlo runs sim¬ 
ulated for a given value of k, a certain number v(k') resulted in a fault, i. e., 
a wrong end result. If the actual probability that a fault occurs for k cho¬ 
sen error locations is f(k), then v(k) / n(k) is an unbiased estimator for f(k), 
and the uncertainty of this estimation is given by \Jn{k) f{k){\ — f(k))- (This 
is because the probability that within the n(k ) runs one finds v(k ) faults is 

given by j f(k) v ( k \l — /(k)) n ^ _v W. This binomial distribution has a 
variance of n(k)f(k)( 1 —f(k))). 

This then allows to estimate the probability of a fault under independent 
errors as a sum of the frequencies n(k) / v(k) weighted by the binomial prob¬ 
abilities p(k ): 

Pfauit(eo) = EpW^tS- (4- 1 ) 

k =o 
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The uncertainty of this estimation is at the h level 


L 




1/2 


APfauitOo) = Y J p 2 {k)n(k) 


V 


(4.2) 


n 


(k) V n(k)J 


It is easy to see that many of the terms in Eq. (4.1) are negligible. Choosing 
a cut-off k cut , we may argue that for k > k cut , p(k) is so small that Pf au lt does 
not change significantly if we replace all the frequencies v(k)/n(k) with k > 
kcut by v(k CVLt )/n(k cui ), i.e. 



(4.3) 


The rest term O(-) can even be bounded easily. In the stochastic limit of 
many Monte Carlo runs, the frequencies decrease strictly monotonically, 
v(k + l)/n(k + 1) < v(k)/n(k), and hence, the rest term is negative and 
its modulus is strictly smaller than the under-braced term in Eq. (4.3). The 
purpose of this approximation is easy to see. Once one chooses a cut-off /c cut 
large enough such that the approximation error | O (■) | is small compared to 
the value of Pf au it/ one knows that it suffices to run Monte Carlo simulations 
only for k < k cat . The cut-off error is simply added to the statistical error of 
Eq. (4.2) in order to give the full error to be denoted with error bars. 

If k cn t has to be chosen large, one may not want to wait for all fc cut sets of 
Monte Carlo runs to have finished. A solution is then to skip some runs, i. e., 
to only perform runs for k E {A k, 2A k, 3A k ,..., k cu t}- The missing frequencies 
v(k)/n(k) can then be replaced by the frequencies for the next-lower value of 
k actually calculated, in a manner similar to that used in the previous para¬ 
graph. Again, the truncation error thus introduced can be bounded and made 
small by choosing A k not too large. Note that care must be taken to adjust Eq. 
(4.2) to the case of A/c > 1: Variances that occur several times in the sum must 
not be added Pythagoreically but directly, as they are no independent sources 
of errors. 

In practice these two truncation techniques allow for a saving in comput¬ 
ing time by orders of magnitude without increasing the error a lot. 

4.4.5 Treatment of non-Clifford gates 

One may argue that the whole scheme presented so far is unable to provide 
realistic estimates for fault-tolerance thresholds because it does not allow to 
simulate computational networks with non-Clifford gates. This seems indeed 
to be a severe objection because a quantum computer without non-Clifford 
gates is of no use due to the Gottesman-Knill theorem and the presence of 
non-Clifford gates in fact does reduce the fault-tolerance threshold. 
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4. Upper bounds to the fault-tolerance threshold 


The reason that the code fault probability is increased by non-Clifford 
gates seems to be, however, not related to the fact that they cannot be sim¬ 
ulated efficiently on a classical computer. 13 The reason is rather that non- 
Clifford gates cannot be implemented transversally. For a transversal gate, 
the logical qubit is in the code space before the gates on the constituent qubits 
have been performed and again in the code space afterwards. Hence, if the 
sub-gates are performed simultaneously, the qubit leaves the code space only 
momentarily or even not at all. Hence, it always enjoys the protection of the 
code on the given concatenation level. In a non-transversal gate, however, 
the sub-gates (i. e., the gates on the level of the constituent qubits) cause the 
logical qubit to leave the code space until the whole procedure is finished. 

Fig. 4.4a shows Shor's construction [Sho96] of a fault-tolerant Toffoli gate 
(which is actually an instance of the general gate-teleportation scheme found 
later by Gottesman and Chuang [GC99]): The ancillary "cat state" 

m) :=*(ior+ii>* 7 ) 

is not a valid code word, and hence, the |0) ancillae leave the code space 
as soon as they are coupled to the cat state by the transversal bare CNOT 
and Toffoli gates in the parentheses. As we cannot guarantee any longer for 
their correctness by syndrome checking, the ancillae are verified by the par¬ 
ity measurement on the cat state, done twice as always with fault-tolerant 
verifications. Even though the ancillae are now verified to be correct, they 
are no longer in the code space and cause the operand qubits to leave the 
code space as well after they interact with them due to the three transver¬ 
sal CNOTs. Then, the operand qubits are measured, teleporting the result to 
the ancillae which now become data qubits. These data qubits now undergo 
conditioned transversal correction operations. Only after these, the logical 
data qubits may be expected to be in the code space again, and only then, a 
syndrome measurement (denoted "}SR" in the circuit) can be performed. 

To summarize: In a transversal gate, a syndrome measurement can be 
performed immediately after each constituent qubit has been touched by just 
a single gate. In the Toffoli construction, however, each constituent qubit is 
touched by up to five gates before its syndrome can be checked for the next 
time, and hence, the probability that more than one error has occurred and 
the recovery thus fails is significantly increased. 

It seems reasonable to assume that this is the by far dominant reason why 
non-transversal gates reduce the fault-tolerance threshold. The idea is now 
that the effect of this delay in syndrome measurement can be faithfully mod¬ 
elled by a purely Clifford network. We simulate, instead of the Toffoli con¬ 
struction of Fig. 4.4a, a network which is equally complex but contains only 

13 "is not related" might be too strong a statement. A subtle relation certainly does exist 
between the Gottesman-Knill theorem and the actual reason given now, but I strongly doubt 
that this invalidates the reasoning to be exhibited in this section. 
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Clifford gates. In Fig. 4.4b, such a network is shown, called a "placebo Tof- 
foli". In the place of the bare Toffoli gate, it performs a three-qubit Clifford 
gate. This could be any Clifford gate; here, the one obtained from a sequence 
of two CNOTs has been chosen. After the simulation of this three-qubit gate 
(which is a sequence of two gates for the underlying Clifford simulator but 
considered as one gate within the fault-tolerance simulation), a noise event is 
simulated with the usual probability, but the Pauli operation is now chosen 
from {1, X, Y, Z}* 3 \lH 3 . If a concatenated code is simulated, the lower-level 
double-CNOT is not simulated by performing transversally the CNOTs but 
by recursively performing the whole placebo network on the lower level (un¬ 
less the lower level is the physical level, of course). As in the real network, 
the simulator abstains from doing any syndrome measurements after any of 
the transversal gates even though the logical qubits do not leave the code 
space. The conditional operations that finish the teleportation (marked with 
dashed boxes in Fig. 4.4) are different to those in the real Toffoli case. They 
have to be changed such that the placebo Toffoli network actually does per¬ 
form the placebo Toffoli operation, i. e., the double CNOT, on the upper level. 
Nevertheless, after the conditional correction, one may simulate error events 
such as if the two-qubit correction operations of the Toffoli case had been per¬ 
formed and not just the simpler local operations in the placebo network. This 
ensures that correlated errors are introduced with the same rate as in the real 
Toffoli network. 

I regret that I cannot present numerical results for an example compu¬ 
tation that involves the Toffoli placebo construction. I have made prelimi¬ 
nary tries, which however have failed to give conclusive results before we 
decided to freeze this project in order to first pursue another application of 
the GraphSim Clifford simulator, which will be described in the following 
chapter. However, even though I was not able to finish this work in the 
present thesis by providing numerical examples for simulations of a placebo 
construction I still feel that this is a promising ansatz which may be worth 
further pursuing. 


4.5 Discussion 

Given a set of bare error parameters, networks building blocks and a tim¬ 
ing/parallelization scheme, can we expect that a simulation of the described 
kind gives an accurate estimate for the failure rate of the real quantum com¬ 
putation? As long as only Clifford gates are involved this should be the case, 
as the simulation is exact for Clifford networks. If non-Clifford gates are sim¬ 
ulated in the fashion presented in the previous section, a sceptic may doubt 
that the "placebo" construction really captures all conceivable modes of fail¬ 
ure. For example, the logical qubit leaves the code space for considerable 
time while the placebo construction stays in the code space. Without a rigor- 
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4 . Upper bounds to the fault-tolerance threshold 


ous proof it is not justified to claim that the failure probability of the placebo 
construction equals that of the real gate. However, it seems a very reasonable 
assumption to say that the placebo construction will not be more likely to fail. 

This last point should be elaborated more clearly Fault-tolerant operation 
requires that all gates within the chosen universal gate set can be performed 
fault-tolerantly, and some gates may put stricter limits onto the bare error 
probability than others. As before, we consider the typical case where this 
gate set consists of a number of Clifford gates and one non-Clifford gate, say, 
the Toffoli gate. Let us first assume that our quantum computer is only ca¬ 
pable of performing Clifford gates, and that it is possible to perform such 
calculations of arbitrary length with arbitrarily small probability of failure if 
the bare error rate is below a threshold £0 The existence of such a thresh¬ 
old is guaranteed by the threshold theorem. If we now improve the quan¬ 
tum computer and give it the capability of performing the non-Clifford gate 
needed to make its gate set universal, this change cannot make the threshold 
larger. This is because the Clifford gates are still needed and also because the 
fault-tolerant construction for this non-Clifford gate involves Clifford gate 
which have to be performed fault-tolerantly. Hence, the threshold for univer¬ 
sal quantum computation will be smaller than that for pure Clifford opera¬ 
tion, 4 < 4. 

We have argued that our simulation allows us to get sound estimates 
with controlled error (which can, in principle, be made arbitrarily small by 
running enough Monte Carlo simulations) for 4. Furthermore, if we use a 
"placebo" construction, we may get a reasonable estimate for 4. However, 
the error of this estimate cannot be made arbitrarily small because we can¬ 
not exclude the possibility of systematic error which are not accounted for. 
Still, because of 4 < 4, our estimate for 4 will be (within the margin of 
its statistical uncertainty, which can be made small) an upper bound to the 
true threshold 4. Furthermore, the estimate for 4 will be smaller than the 
estimate for 4, 1 - e - be a better upper bound to the true value 4. We may 
even expect it to be a rather tight upper bound if the agree to assume that the 
residual systematic errors just discussed are small. 

This hope is not unreasonable, but even if one is not willing to make this 
assumption, we have gained something, because we need it only to claim 
that the simulation results qualify as estimates for the true threshold value. 
The claim that they are upper bounds holds even in the face of non-negligible 
systematic errors, because these can only cause the estimate to be too large. 
Virtually all published results on fault-tolerance thresholds are either strict 
lower bounds or uncontrolled estimates. As there is good reason to feel un¬ 
easy about uncontrolled estimates (i. e., estimates for which a statistical or 
systematic uncertainty cannot be states), it is useful to have a method that 
provides upper bounds because if these turn out to be close enough to the 
estimates, they help to constrain the margin of error of these estimates. 
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Abstract 

We study the preparation and distribution of high-fidelity multi-party entan¬ 
gled states via noisy channels and operations. In the particular case of GHZ 
and cluster states, we study different strategies using bipartite or multipar¬ 
tite purification protocols. The most efficient strategy depends on the target 
fidelity one wishes to achieve and on the quality of transmission channel and 
local operations. We show the existence of a crossing point beyond which the 
strategy making use of the purification of the state as a whole is more efficient 
than a strategy in which pairs are purified before they are connected to the 
final state. We also study the efficiency of intermediate strategies, including 
sequences of purification and connection. We show that a multipartite strat¬ 
egy is to be used if one wishes to achieve high fidelity, whereas a bipartite 
strategy gives a better yield for low target fidelity. 

[PACS: 03.67.Mn, 03.67.Hk, 03.67.Pp] 
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5.1 Introduction 

In the past, (multipartite) entanglement has been mainly considered as a puz¬ 
zling artifact of quantum mechanics. More recently, however, the focus on 
entanglement has shifted, as it was realized that entanglement also consti¬ 
tutes a valuable resource for quantum information processing. Possible ap¬ 
plications of multipartite entanglement include certain security tasks in dis¬ 
tributed communication scenarios [HBB99, CL04, CGS02], the improvement 
of frequency standards [WBI + 92, WBIH94], as well as measurement based 
quantum computation schemes [RBB03]. 

In this context, the problem of generating multipartite entanglement of 
high fidelity arises. If entangled states are to be distributed among spatially 
separated parties, as it is e. g. required in distributed communication scenar¬ 
ios, the main obstacle comes from channel noise. Possible ways to overcome 
channel noise and hence to successfully generate high-fidelity multipartite 
entangled states have been developed. These methods are based on (i) quan¬ 
tum error correction and make use of (concatenated) quantum error correc¬ 
tion codes [KL96], or (ii) entanglement purification [BBP+96, DEJ+96, DAB03, 
ADB05]. While (i) is applicable to directly distribute arbitrary states, (ii) con¬ 
centrates on the generation of specific, maximally entangled pure states. The 
generation of maximally entangled pairs of particles allows in turn to dis¬ 
tribute arbitrary states by means of teleportation. In both cases, a substan¬ 
tial overhead is required to guarantee successful, high fidelity generation of 
the desired states. In (i) this overhead arises from redundant encoding, en¬ 
abling one to perform error correction, while for (ii) several identical copies 
need to be prepared and locally processed to generate high fidelity entangled 
states. The quantification of this overhead, or the quantum communication 
cost, which we shall define more precisely, is the main concern of this article. 

To be specific, we will concentrate on schemes based on entanglement pu¬ 
rification. These schemes are specially suited to generate entangled states of a 
specific form, and are hence expected to perform better than general purpose 
schemes such as (i). In fact, a remarkable robustness of entanglement pu¬ 
rification protocols against noise in local operations -which we consider in 
addition to channel noise- has been found [DAB03, ADB05]. That is, errors 
of the order of several percent in local control operations can be tolerated, still 
allowing for the generation of high fidelity entangled states, even in the pres¬ 
ence of very noisy quantum channels and with only a moderate overhead. 
For perfect local operations, the required overhead in resources is solely de¬ 
termined by the noisy quantum channels. In this case, the channel capacity 
[BKNOO, DCH04] provides a suitable measure for this overhead. In a bipar¬ 
tite communication scenario, the channel capacity gives the optimal rate of 
quantum communication, i. e. the amount of quantum information transmit¬ 
ted per actual channel usage. While one might think that the abstract notion 
of channel capacity may also be employed to our problem -the generation 
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of certain high fidelity entangled pure states-, one actually faces a number 
of difficulties. First, channel capacities are asymptotic quantities which are 
very complicated to calculate; second, the definition of channel capacity is 
not suitable to account for imperfect local operations (e. g. noise in local 
coding and decoding procedures); and third, we are actually considering a 
restricted problem, namely the generation of specific multipartite entangled 
states, rather than the successful transmission of arbitrary quantum informa¬ 
tion. 


We thus introduce a quantity closely related to quantum channel capac¬ 
ity, namely the quantum communication cost C^ : q. Cf,g denotes a family of 
quantities which specify the number of uses of the noisy quantum chan¬ 
nel required to prepare a specific (multipartite) entangled state |G) with fi¬ 
delity F > F. In this paper, we will focus on target states |G) which are so- 
called two-colourable graph states. These states include, for instance, GHZ 
states and cluster states -a universal resource for measurement based quan¬ 
tum computation [RBB03]- and they are locally equivalent to codewords of 
Calderbank-Shor-Steane error correcting codes [Ste96b, CS96]. We establish 
upper bounds on C f,g by optimizing over a large class of different strategies 
that generate these multipartite entangled states. These strategies include, as 
extremal cases, (i) the generation and purification of pairwise entanglement, 
from which, by suitable connection processes (or, alternatively, teleportation) 
the desired multipartite states are generated; (ii) the generation and direct 
multipartite purification of the desired target states. Intermediate strategies, 
e. g. the purification of smaller states to high fidelity and their subsequent 
connection to the desired larger state, will also be investigated. Depending 
on the actual noise parameters for channels and local control operations, and 
on the desired target fidelity F, the optimal strategy varies. For high target 
fidelities, multipartite strategies turn out to be favorable. 


This article is organized as follows: In Sec. 5.2, we present the concepts we 
will use: We start with a review of the graph state formalism in order to intro¬ 
duce notation and the two types of states we wish to distribute, namely GHZ 
states and ID cluster states. We shall also introduce a technique to connect 
two smaller graph states to obtain a larger one. Then, we give details for our 
noise models and review the employed purification protocols. Readers famil¬ 
iar with these concepts may skip this section. Sec. 5.3 explains the different 
strategies for employing the protocols that we wish to compare. The actual 
comparison is done using extensive numerical Monte Carlo simulations and 
results are presented in Sec. 5.4. In order to corroborate these results we have 
done analytical studies for certain restricted noise models (Sec. 5.5). We con¬ 
clude with a summary (Sec. 5.6). 
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5.2 Basic concepts 

5.2.1 The graph-states formalism 

A graph G = (V, E) is a collection V = {a,b, c, .. . } of N = |V| vertices 
connected by edges E C [V] 2 . The description of the edges is given by the 
adjacency matrix Tassociated with the graph 


( T G)ab = 


1 , if a and b are connected by an edge, 
i.e. {a, b} G E 
0 , otherwise 


The neighborhood N a C V of vertex a is defined as the set of vertices con¬ 
nected with it by an edge, N a = {b : {a, b} G E}. 

With each graph G we associate a pure quantum state. If the graph's 
vertex set can be separated into two sets A and B such that no edges exist 
between vertices of the same set, we call it a two-colourable graph 1 . The 
vertices are qubits and the edges represent interactions. 

There are three equivalent descriptions of graph states which are 
reviewed in the following sections (For a detailed treatment see [HEB04]): 


Graph states in the stabilizer formalism 

Associated with a graph G is a set of N operators 

4 ‘>=n a >. (5.i) 

beN a 

They form a complete set of commuting observables for the system of qubits 
associated with the graph and therefore possess a set of common eigenstates 
which form a basis of the Hilbert space. These eigenstates are called graph 
states and are here written as | G,p) where the a th component of vector p S 
{0, 1} N is equal to 0 if \G,p) = |G,/<) and 1 if K ( fj \G,p) = — \G,ji). 
We abbreviate |G) := |G,0). We also sometimes suppress the letter “G” and 
write just \ji), if the context makes clear which graph G is meant. 


Graph states in the interaction picture 

A graph state with p = 0 can be written in the computational basis in the 
following manner: 


10 


n 

{a,b}EE 


AZ (flfo) 



(5.2) 


Mathematical literature prefers to call such graphs bipartite but we reserve this term to 
denote operations or settings comprised of two different sites (locations). 
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where AZ is the controlled phase gate, 

AZ^ = |00) fl6 (001 + |01) flb (01| + |10) flfc (101 - |ll) flb (11|, (5.3) 

which corresponds to an Ising-type interaction, AZ <ah '> = *, with in¬ 

teraction Hamiltonian H lab > given by 

H {ab) = \ (l - ^ fl) ) ® ^ (l - ni fc) ) = |ll) flb (11|. 

That is, |G) is generated from a pure product state by applying interactions 

between all pairs of particles connected by edges. 

We list some useful relations for later reference: The 2 N common eigen¬ 
states of the operators K- a ) can be generated from | G) by applying all possible 
products of &Z, a S 1,2,..., N. This can be seen from 

kcw*’ ig>= 4*’ n a a> g)= n a ig) 

ceN b ceN h 

'- . -' 

= (-1 )^cr z (fl) a [b) n 4 C) |G) = (-l)<Vj fl) |G), (5.4) 

ceN b 

' -s/-' 

KW 

a 

which means that A n) | G, 0) = | G, 0 ... 010 ... 0). From this relation, together 
with the fact that (Jy ] = icr l x n) a z a \ one can deduce the effect of a x a> and 
(see Ref. [HEB04] and [ADB05] for proofs): Splitting the index vector ;i into 
Ha, fiN„ (neighborhood of vertex a), and }iR a (remaining vertices), we write 

err | G,}l a }lN a m a ) = I G 'M l NjlR a ) (5-5) 

a[ a) I G,ii a fi Na }i Ra ) = (-1 Y“ \G,p a f^r a jiR a ) (5.6) 

| G,ii a jiNaBR a ) = I G,]GiW a BR a ) (5.7) 

where the over-bar means bit complementation. 

Graph states in the valence bond solid (VBS) picture 

Another description of graph states was introduced in Ref. [VC04b], In this 
picture, every edge is replaced by a pair in a maximally entangled state, usu¬ 
ally (|00) + |01) + 110) — |ll))/2. Each qubit a gets replaced by d a virtual 
qubits, where d a = N„| is the degree of vertex a. The physical qubit is re¬ 
covered by projecting the virtual qubits onto the two-dimensional subspace 
of the physical one (see Fig. 5.1) using as projector 
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1 


Projection 



Figure 5.1: Producing a graph state in the VBS picture. 

fa) (b) 

•-•-•-•-• 



Figure 5.2: Graphs for (a) cluster and (b) GHZ states. 


Cluster states and GHZ states 

In this article, we study the purification of graph states using two important 
representatives from this class as examples. By "cluster states", we mean 
graph states associated with a regular lattice as graph, in this article always a 
line as in Fig. 5.2a, and with p = 0. The term GHZ state will in this article be 
used for a graph state (again with p = 0 ) associated with a star-shaped graph 
G* as in Fig. 5.2b. Such a state can be written as 

|G*) = (|0> ® |+)® (N “ 1) + |1) ® |-) 0(N_1) ) 

and is hence in its entanglement properties equivalent to an "ordinary" GHZ 
state (| 0 )® N + | 1 )® N ) - (l®Had® (N_ 1 ) )|G*> (where |±) = ^(| 0 ) ± 

11)) and Had is the Hadamard operation Had = f j ^ J). 


Bell pairs and graph-state formalism 

In order to keep a certain homogeneity, we will employ a new notation for 
the states of the Bell basis, usually written as: 


<©) 

Y ± ) 


^(| 00 }±| 11 » 

-4(i°i>±iio», 


Applying a Hadamard operation on the second qubit, one obtains a new basis 
formed by the graph states |G 2 , 00 ), |G 2 , 01 ), |G 2 , 10 ) and |G 2 ,11), where G 2 
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ment result 


Figure 5.3: How to use the connection procedure of Sec. 5.2.1 to assemble 
large (a) cluster and (b) GHZ states from smaller cluster or GHZ states. 


denotes the graph composed of two vertices and one edge. Our new notation 
shows directly the relation between this two bases: 


|o + ) 

=:|B;00) 

= Had (2) |G 2 ,00) 


|Y+> 

=:|B;01> 

= Had (2) |G 2 ,01) 


|o-> 

=: B;10) 

= Had< 2 ) |G 2 ,10) 


0 ) 

=:|B;11) 

= Had (2) |G 2 ,11) ■ 

(5.9) 


Connection of graph-states 

In this section, we define a procedure to connect two graph states, | G\) with 
N\ qubits, and | G 2 ) with N 2 qubits, "fusing" together their respective vertices 
fli and « 2 , yielding a state |G) with N\ + N 2 — 1 qubits. This process is de¬ 
picted in Fig. 5.3. To realize this action, one applies a projective measurement 
on a-] and 02 , given by P 2 = | 0 ) (001 + | 1 ) (111 and P£~ = | 0 ) (011 + | 1 ) (101 
(with outcomes 0 and 1). P 2 is defined like in the VBS picture. By simi¬ 
larity with this picture, if the result of the measurement is 0, the final state 
is the graph state resulting from the connection of G\ and G 2 . If one ob¬ 
tains 1, a correction has to be done. As shown below, it is sufficient to ap- 

ply Yl h eN a2 c7 0 to the resulting state. Recalling that K^ 1 jG) = |G) with 
kjy = 4 a) UbeN a a z b \ one sees that any graph state can be decomposed as 
IG) - |0)„ <g> \x) + |1) B <g> UbeN a \x )• Applying UbeN a2 4 b) to the state 
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resulting from P 2 one obtains 

n crjVlGi) |G 2 ) 

beNa 2 

= n ^ h] ( a z d) i°) 0 \xi) ® n 10 + 1 1 ) ® n ^ 10 10 

b(zNa 2 y d(zNa 2 CGN^ 

|0) ® \xi) ® \X 2 ) + n <7 z 6) I 1 ) ® 10 ® IX 2 ) = |Gi + G 2 ) (5.10) 

fcS(N fll +Na 2 ) 

Fig. 5.3 shows how to use this technique to assemble cluster and GHZ 
states from |G 2 ) states. 

5.2.2 Noise model 
Channel noise 

In any realistic setting, channels will be noisy. We study the influence of chan¬ 
nel noise by considering restricted noise models, where the Kraus represen¬ 
tation of the superoperators is diagonal in the Pauli basis. This is a common 
and usually sufficiently general model [DHCB05] (In particular, any noisy 
channel can be brought to such a form by means of (probabilistic) local op¬ 
erations). This allows for an efficient and convenient simulation by Monte 
Carlo techniques (see Sec. 5.4.1). We consider the following channels: 

phase-flip channel: 

p i-> £ z (fl) (p) = qp + (l - q)cz a] pcz a) (5.H) 

bit-flip channel: 

P ^ £ ( x a \p) = qp+( 1 - q)o-^ ] pcr { x a) (5.12) 

depolarizing channel: 

p i-> £ (fl) (p) =qp+ + 

4 rt X a) +^Vi n) ) (5-13) 

In case of depolarizing channel, we define 

P = (4?-l)/3 

which allow us to rewrite Eq. (5.13) as 

p i * £ {a Xp) = PP+ + +°'y ) pOy a) + Cz a) pCz a) ) (5-14) 

(1 — q) will be called the alteration probability and p the reliability. 



0 - 


© 


0 


0 












0 


0 


0 


0 


5.2. Basic concepts 


85 


Local noise 

As part of the purification protocols, local one- and two-qubit unitary opera¬ 
tions are employed which may be noisy. An imperfect operation is modeled 
by preceding the perfect operation LI 1 '" 1 ’' 1 with the application of one of the 
noise superoperators 8 from Eqs. (5.11-5.13), i. e. the state is transformed as 

p^U^ b) (s a {£ b {p))) 100 

We assume that the protocols are executed with the least possible number of 
operations to keep accumulated noise low. Hence, if a two-qubit gate Lf{0 
is preceded by one-qubit gates U[ a ) and 10 we apply one combined unitary 
L/ i,7/ 2 = 1/0 (,/|0 which is subjected to noise only once. 

Commutation between connection and noise 

We now state an observation that will later (in Sec. 5.5.2) be of use. 

For any graph states |Gi) and 0) which are connected by the procedure 
described in Sec. 5.2.1, one can show that the noise processes commute with 
the connection procedure, if they are expressed by a superoperator by only 
<T Z Pauli operators. This comes from the fact that the neighborhood of the 
connected vertices a\ and a 2 changes with the connection and hence, cr x and 
a y Pauli operators will affect different vertices (see Eqs. (5.5), (5.6) and (5.7)). 

The commutation rules between projector P 2 (see Eq. (5.8)) and u z can be 
deduced from the following expression of the connected graph state: 

010)02) = P 2 (|o) fll |0)Jx) 1 |x) 2 + 

+ n 0 (C) l°) fll I0J0002 + 

ceNa 2 

+ n a z b) 101 i°>jx>i ia) 2 + 

beN ai 

+ n b) n ^ c) i 1 )^ 102101102) 

b€.N ai c€N a2 

Recalling that a z |0) = |0), a z |1) = — |1) one can show that: 

P 2 0 l} |Gi) |G 2 ) = cr z (fl) P 2 0i)0 2 ) (5.15) 

P 2 0 2) |0) |G 2 ) = 0010)02) (5.16) 

5.2.3 Local noise equivalent 

To judge how close state p it to the desired state \ip), one often usually em¬ 
ploys the fidelity F := (xp\p\xp). However, is may be advantageous to re¬ 
gauge the fidelity by introducing the following derived measure: We define 
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the local noise equivalent (LNE) as the level of local depolarizing noise (in 
terms of the alteration probability (1 — q) of Eq. (5.13)) that one has to apply 
to each qubit of the perfect state \ip) to deteriorate it to the same fidelity F 
as p has. The advantage of this measure is twofold: (i) It is more natural for 
uses of states in quantum error correction schemes, as it can be compared 
directly to the fault-tolerance threshold in case of uncorrelated-noise models, 
(ii) It does not fall off exponentially with the size of a state for constant noise 
levels, as the fidelity does. On the other hand, it often cannot be calculated 
analytically in a straight-forward way. We hence used a numerical Monte 
Carlo simulation of the state deterioration (which is why the LNE scale in the 
figures has error bars). 

5.2.4 Purification protocols 

The purpose of entanglement purification is the following: One is given an 
ensemble of multi-party states, which all are distributed over two (or more) 
sites and exhibit entanglement between the sites. These states are only an 
approximation to the desired state |Y) (Y| with an insufficient fidelity, which 
one wishes to improve. As the sites are spatially separated, one cannot apply 
joint operation on the distributed parts of a state. Instead, one compares (in 
case of the so-called recurrence protocols, which are considered here solely) 
pairs of entangled states, makes joint operations on them, and then measures 
one of the state in order to gain information about the other. Only for specific 
measurement outcomes, the other state is kept. After iterating this procedure, 
one is left with an ensemble of smaller number of particles but higher fidelity. 

Bipartite purification 

Several protocols have been proposed to purify bipartite entangled states 
[BBP+96, BDSW96, DEJ 1 96]. To test the different strategies, we used the most 
efficient which can be used to purify an ensemble of |<f> + ) states, namely the 
one described in Ref. [DEJ + 96]. We present here a modified version of this 
bipartite entanglement purification protocol (BEPP) which allows for the pu¬ 
rification of the connected graph-state pair. As we are concerned only with 
this graph in this section, we simply write \pv) for the different basis states 
\G 2 , ]iv). Recall that |00) = 1/02 (|0> |+) + |1) |-)) (see Eq. (5.9)). 

Alice and Bob want to share entangled pairs with high fidelity. At the be¬ 
ginning they are given an ensemble of noisy |0,0) states, each of them owning 
one part of the pairs. We consider a state diagonal in the graph-state basis, 

p = x 00 |00)<00| +*oi|01)(01| +* 10 |10)<10| +*u|ll)<ll|. (5.17) 

We remark that such a standard form can always be achieved by means of 
depolarization, i.e. applying certain (random) local unitary operations. Each 
step of the protocol consists of the following operations: (i) Alice and Bob 
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perform unitary operations on their particles, with Alice's and Bob's unitaries 
given by 

SA= Ti{-i v)' s ->=^ Had 0 0 Hadt - 

(ii) Alice performs a CNOT operation from the first state to the second and 
Bob from the second state to the first; (iii) Alice and Bob measure the sec¬ 
ond state in different bases. To see the effect of this procedure, we calculate 
the fidelity and yield obtained after one step with two initial states given by 
(5.17). 

In (i), Alice and Bob apply and Sg, respectively, in order to swap 111) 
and 110). Then, in step (ii), they apply the bilateral CNOT. One can check that 
the effect of this operation on the graph state basis is given by the following 
map: 

I PaBb) | v A v B ) *—► \va®v a ,}Ib) \va,vb®Pb) , (5.18) 

(Here, © indicates bitwise and, i. e. addition modulo 2.) Last (iii), Alice 
and Bob measure the qubits of the target state. This is done in the eigenbasis 
{|0) x , |l) x } of cr x for Alice and in the computational basis {|0) z , |1)_}, for 
Bob. By this they obtain the eigenvalue of the correlation operator K 2 defined 
in Eq. (5.1) and determine the value of the second bit describing the state. If 
it is 0, they keep the first state. They discard it otherwise. 

After the measurement, they keep the control state with success probabil¬ 
ity k = 0oo + *11 ) 2 + (*oi + Ho ) 2 and the new coefficients are given by: 

x 'oo = 0 oo + *ii )/k 
x oi = 0 oi +*10) A 

x'io = (2*00*11) A 

x' u = (2 x Q1 x w )/k. (5.19) 

Hence, the fidelity is F = x ' 00 = (xjy, + Xy { ) /k. The yield of the step, defined 
as the number of remaining states divided by the number of states before 
the step, is given by k /2 as half of the states (the targets) are measured and 
discarded. 

The unitary operations performed at the beginning of the protocol (step 
(i)) are required for its convergence. It guarantees that fidelity 1 is a fix 
point of the protocol which is approached when iterating the procedure. The 
CNOT operation is a means of transferring information from the first qubit 
to the second. The measurement allows to distinguish between {|0,0), 11,0)} 
and {| 0 , 1 ), 11 , 1 )} and hence, determines the second bit of the index vector. 

Multipartite purification 

Multipartite purification protocols (MEPP) have been introduced in Ref. 
[MPP + 98] for GHZ states, were further developed in Ref. [MS02] and 
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extended to all two-colourable graph states in Refs. [DAB03, ADB05]. Recall 
that a two-colourable graph-state is a graph state in which the vertices can be 
separated into two sets V A and Vg such that no edges exist between vertices 
of the same set. Using the procedure described in Ref. [ADB05], one can 
depolarize any mixed state p to the form 

P = Y ^Ha,hb\ g ,Pa,Pb){G,Pa,Pb\ (5.20) 

Ha, Bn 

without changing the diagonal coefficients (where p A , Pb are binary vectors 
corresponding to sets Va, Vb respectively). Hence we will restrict our atten¬ 
tion to input states of this form. The protocol is composed of two subproto¬ 
cols PI and P2 which we will describe here: 


Subprotocol PI: The states composing the ensemble of two-colourable 
graph-states are processed pair-wise. All parties belonging to set Va perform 
a CNOT operation from the second state of a pair of states to the first one 
while the parties belonging to set Vg perform a CNOT from the first one to 
the second one. This leads to the transformation 

I G,pa,Pb) \G,v a ,vb) \G,p A ,PB®v B ) \G,v a ®Pa,v B ) , (5.21) 

As in the bipartite protocol, the last step consists of measuring the second 
state of the pair. The parties belonging to set Va measure their qubit a in 
the eigenbasis {|0) x , 1 )_ Y } of cr x , obtaining results E {0,1}, while the ones 
belonging to set Vg make their measurement in the computational basis, ob¬ 
taining results £& E {0,1}. From this, we can calculate the part of the index 
vector of the measured state (second state of the r. h. s. of Eq. (5.21)) corre¬ 
sponding to set Va'- 


VA = v A e }i A = 



aev , 


If this is 0, it is most likely that p A — 0, and hence, the first state is kept 
(and otherwise discarded). As consequence, in the expansion (5.20) of the 
ensemble density matrix, elements of the form ho,fi B are increased. One finds 
that the new matrix elements are given by 


A' = - 


7a,7b 


E 


^7aAb^7a,Hb 


\vb®Hb=7b} 


(5.22) 


where k is a normalization constant such that tr(p) = 1. 
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Subprotocol P2: As explained above, subprotocol PI is employed to purify 
with respect to the eigenvalues }Ia associated with set Va- The second sub¬ 
protocol leads to the purification with respect to the eigenvalues of set Vg. It 
is obtained from PI by exchanging the roles of set Va and Vg. The protocol's 
action is described by the following map: 

\G,PA,m) \G,V A ,V B ) t-> \G,ji A © v Ar ji B ) \G,v a ,vb © Vg). (5.23) 

One measures the second state. The measurements on set Vg are done in the 
eigenbasis {|0) x , |l) x } of u x while they are done in the computational basis 
in set V A . This leads to the determination of part /</-> of the index vector. As in 
subprotocol PI, one keeps the state if }ip, = 0. The new coefficient are given 
by 


^"YAiTB K L-J / 04,7B (5-24) 

{(VA,VA)\VA®1tA= r YA} 

where k is a normalization constant such that tr(p) = 1 , as before. 

5.3 Strategies 

5.3.1 Quantum communication cost Cp c 

We now define our figure of merit, the quantum communication cost. We 
consider N spatially separated parties A k , k = 1,2,..., N which are pairwise 
connected by noisy quantum channels £*•/, described by completely positive 
maps acting on density operators for qubits. We will quantify the quantum 
communication through these quantum channels using the quantum com¬ 
munication cost Cpi, i.e. the number of usages of the quantum channel £, y, 
while classical communication between pairs of parties will be considered to 
be for free. Sending a single qubit through the quantum channel £/ c / costs 1 
unit, i.e. Cu = 1, while the transmission of an arbitrary state of M qubits 
costs Qy = M. We will be interested in the total quantum communication 
cost C, where 

C = Y J C kl - ( 5 . 25 ) 

k<l 

We consider the generation of multipartite entangled states (graph states, to 
be specific) |G) distributed among the parties A k . The goal is to generate 
statesA = 0 -" =1 pi, where the fidelity of each pi, Fj = (G| p, |G), fulfills F,- > F. 
That is, each of the states has a fidelity larger than a threshold value F, which 
we call the "desired target fidelity". We remark that we demand that the en¬ 
semble of output states are in a tensor product form. In principle, weaker 
requirements such as that only the reduced density operators of A have fi¬ 
delity larger than F are conceivable, however one faces certain difficulties 
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(a) BPP strategy 


(b) MPP strategy 
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Figure 5.4: Distribution of N-qubit GHZ states over noisy channels, (a) Bipar¬ 
tite entanglement purification strategy: Bell pairs are sent over the channels 
and purified using a BEPP. The purified pairs are then connected (using the 
procedure of Fig. 5.3) to the desired GHZ state, (b) Multipartite entanglement 
purification strategy: Alice prepares the GHZ state locally and sends all but 
one of the particles through the channels. Then, the MEPP protocol is used. 


in this case. For instance, it is not clear whether each of the copies of the 
state can be independently used for further quantum information processing 
tasks due to possible classical correlations among the copies. Hence, we de¬ 
liberately demand the tensor product structure. We will be interested in the 
total quantum communication cost C required to generate A = <S>t=i Pi with 
F, > F. In particular, we consider the quantum communication cost per copy, 

Cf,g = (5-26) 

where one optimizes over all possible strategies to generate A. Due to this 
optimization, the quantity Cf,g is very difficult to calculate. Hence we restrict 
ourselves to establish upper bounds on Cp r c by considering explicit strategies 
to generate high fidelity multipartite entangled states. 

Multiple variations of this problem are conceivable. For simplicity we will 
assume that all parties are pairwise connected by identical quantum chan¬ 
nels, £ = Sid- Inhomogeneous situations where only some pairs of parties are 
connected by quantum channels (a restricted communication web), or where 
the classical communication is limited, or cases where quantum channels be¬ 
tween different pairs of parties are different (i.e. different noise parameter) 
will not be considered here. 
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We will look mainly at two scenarios depicted in Fig. 5.4, which we de¬ 
scribe now. 


5.3.2 Bipartite purification strategy 


In the BEPP strategy, the parties A^, k = 1,2,... N wish to create a shared en¬ 
semble of N-qubit graph states of high fidelity using a BEPP, where party A/ ( 
holds the qubit corresponding to vertex For each edge of the graph, one 
of the two parties connected by this edge prepares a connected graph-state 
pair, |G2,00) (equivalent to a Bell pair up to a local unitary) and sends one 
qubit of the pair to the other party through a noisy channel. (Alternatively, 
one could use a teleportation-based strategy: Alice distributes Bell pairs to 
the N — 1 other parties. The pairs are purified and then used to distribute the 
multipartite state that Alice has prepared locally.) The effect of the channels is 
given by Eq. (5.13) leading to states of fidelity F = q + and diagonal in the 
graph-state basis. The parties repeat the operation M times so that at the end 
M | E | entangled pairs are distributed between the different partners, where 
\E\ is the number of edges in the graph. The BEPP (reviewed in Section 5.2.4) 
is then applied. This leads to a smaller ensemble of states given by a den¬ 
sity matrix of the same form but with higher fidelity. Finally, the connection 
procedure described in Sec. 5.2.1 is applied: Each party merges together 
the | Na k | qubits which will connect vertex a/ ( with its neighbors leading to the 
desired graph state. We call 

, # final states # final states 

# initial states M 


the yield of the production of final states with fidelity F. To build up the 
desired multipartite state |G), we need one |G 2 ) pair for each edge of G. The 
number of edges for ID cluster and GHZ states is (as for any tree graph) 
\E\ = N — 1. Hence, the quantum communication cost is related to the yield 


by 


Cf,g = 


N- 1 

W 


(5.27) 


The numerator is the number of channel uses (i. e. number of transmitted 
qubits) required to distribute one state. This dependence on the size of the 
state properly reflects that for larger states, already the preparation of the raw 
states is more costly. To allow for easier comparison with the yield, a figure 
that may feel more familiar to the reader, we have plotted in all graphs the 
inverse communication cost C F = Y(F) / (N — 1) which is proportional to 
the yield. 


5.3.3 Multipartite purification strategy 

Alternatively, in the MEPP strategy, a central party, called Alice, creates M N- 
qubit graph states locally. For each graph state, she keeps one qubit and sends 
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the other N — 1 qubits through the channels to the N — 1 other parties. The 
resulting states are then purified using direct multipartite entanglement pu¬ 
rification, i. e. the MEPP reviewed in Sec. 5.2.4. Hence, to distribute one state, 
we need N — 1 channel uses, the same as in the BEPP case. Thus, Eq. (5.27) 
holds for MEPP, as well. 


5.3.4 Mixing of strategies 

Assume that the application of, say, m steps of one of the protocols mentioned 
above reaches a final fidelity F\ with a communication cost of C\, and applica¬ 
tion of m + 1 steps achieves fidelity F2 > F x with communication cost C2 > Q 
(i. e. Y 2 < Y ]). For a certain application, a fidelity of F with Fi < F < F2 is 
required, i. e. m steps are insufficient, but m + 1 steps achieve a higher fidelity 
than desired at the cost of lower yield. In this case, one can find a compromise 
between the two strategies by mixing ensembles: 

Choosing an a E [0,1], one prepares M raw states and then uses the first 
strategy on ocM of them in order to gain olMY\ states of fidelity F x , and the 
second strategy on the remaining (1 — ol)M states to obtain (1 — a) MY 2 states 
of fidelity F 2 . Mixing these states gives an ensemble of fidelity 


ccY^ + (1 - a)Y 2 F 2 
oCY x + (1 - oc)Y 2 


(5.28) 


with a a yield Y = aYi + (1 — oc)Y 2 . This method allows one to obtain interme¬ 
diate fidelities with a better yield. The communication cost mixes according 
to 


1 oc t 1 — a. 
C = Q + C 2 


(5.29) 


5.3.5 Intermediate strategies 

As a "compromise" between BEPP and MEPP, we shall also consider the 
following set of strategies: Assemble small states of N x qubits, send them 
through the channels, purify them, and then use the connection scheme de¬ 
scribed in Sec. 5.2.1 to connect L of the Ni-qubit states to one state with 
N = LN X — L + 1 qubits. 


5.4 Numerical simulations 

5.4.1 Technique 

In generic cases, an explicit numerical simulation of a quantum systems is in¬ 
tractable due to the exponential growth of the Hilbert space with the number 
of involved particles or qubits. In our case, however, an efficient simulation 
is possible for two reasons: (i) All gates that are employed by the protocols 


0 


0 


0 


0 












0 


0 


0 


0 


5.4. Numerical simulations 


93 


are elements of the so-called Clifford group and hence, the Gottesman-Knill 
theorem applies, which allows for efficient simulations of pure state evolu¬ 
tions. (ii) The considered noise channels have Kraus representations that are 
diagonal in the Pauli basis. 

To explain (i), we start by reviewing the Gottesman-Knill theorem 
[Got98a, NCOO]. It says that it is possible to simulate so-called stabilizer 
circuits efficiently on a classical computer. These are quantum circuits 
containing only preparation of computational basis states, operations from 
the Clifford group, and measurements in the computational basis. The 
N-qubit Clifford group Cjv is the group of those unitary operations that map 
Pauli operators onto Pauli operators under conjugation, i. e. 

C N ■= jlf g SU(2 n ) | UPU f eV N VP G , 

V N := {±1 ,±z} • {l,(r x ,<r y ,<r z } m . (5.30) 

It happens to contain all the operations that we need for purifying, and hence, 
we can simulate the execution of the purification protocols described in Sec¬ 
tion 5.2.4. 

Aaronson and Gottesman have given a fast algorithm which can perform 
such a simulation, and also supplied an implementation in the C program¬ 
ming language [AG04], We have used this software at the beginning of 
our studies, but after realizing that its performance is not sufficient for our 
purposes, developed a new, faster algorithm, which is described elsewhere 
[AB06]. 

The state represented in our simulator is always a pure state. 2 How¬ 
ever, in entanglement purification, one usually deals with mixed states, rep¬ 
resented as density matrices. Nevertheless, due to the fulfillment of condition 
(ii), we can get around this problem using a Monte Carlo technique, which 
we describe now. 

To represent the ensembles of states we start with a high number N; of 
qubits, typically several thousand times the number of qubits in the states to 
be purified. The qubits are initialized to a tensor product of |G, 0) states. Note 
that all these qubits can potentially get entangled, and hence have to be part 
of the same simulated quantum register. This would be prohibitive without 
a very efficient algorithms for the stabilizer simulation. 

We then simulate all steps that are required to prepare Bell pairs or graph 
states, to purify them and to measure them. Depending on the measurement 
results, states are kept or discarded. Several iterations of the protocols are 
simulated. 

The transmission through the perfect channels amount to a simple rela¬ 
beling: The program remembers the new site, where the qubit resides, as this 
indicates which qubits can be subject of joint operations. 

2 There is an algorithm for simulation the evolution of a rather restricted class of mixed 
states [AG04], which is however not general enough for our purposes. 


0 


0 


0 


0 











0 


0 


0 


0 


94 


5. Publication: Quantum communication cost of preparing ... 


Simulating the channel noise is done by randomizing over many simula¬ 
tion runs as follows. The three noisy channels that we have considered, Eqs. 
(5.11-5.13), are simulated using a pseudo-random number generator (RNG). 
Whenever noise is to be applied onto a qubit, a random number between 0 
and 1 is generated, and if it is smaller than (1 — q) (the noise level), a x (a z ) 
is applied for bit-flip (phase-flip) noise. For depolarizing noise, the RNG is 
used again to obtain an integer between 1 and 4 which determines which of 
the operators 1, a x , a lJr a z to apply. 

After the preparation of M initial states, m iterations of the protocol and 
for the BEPP case, connection of the purified pairs, Nf final states remain. 

The yield is then given by Y = gf. This is, however, not a good estimate 
for the asymptotic yield in the limit of infinite ensembles for the following 
reason: If the number N,_ i of states at the beginning of purification step i is 
odd, we have to discard one state, because we can only deal with pairs of 
states. Hence, we better estimate the yield by 


Ni N 2 N f 


(5.31) 


(with M = No, Nf = N m , and [[NJJ = N for even N, and [[NJJ = N — 1 for 
odd N.) 

The fidelity can be determined by measuring the final states in the graph 
basis. This is because all the intended operations and the random noise op¬ 
erations map graph states onto graph states, so that all Nf final states are of 
the form | G,p). The index p can be determined as follows: For each state, 
the graph state creation operation of Eq. (5.2), Yl{a,b}eE NZ {ah) (which is Her- 
mitean) is applied again onto the state. If one then applies Hadamard gates 
on all qubits and measures in the cr z basis, the measurement results spell out 
the index vector p. As we intended to create |G, 0) states, we call the number 
of states for which we measured 0 the number Ng of "good" states and hence 
estimate the fidelity as 


N tot N‘°t(N‘ ot - N‘ ot ) 

F = Nf* ± ^ (Np0 ' 

The superscript "tot" indicates that many runs of the simulations are made 
and that the numbers are the sums of the numbers in the individual runs. The 
uncertainty term follows from the expectation that, given a true fidelity Fj, 
the number of good states F| ot output by the Monte Carlo simulations after 
many runs is distributed according to a binomial distribution with length 
N| ot , hit probability Fj and hence standard deviation Nj ot \JFj (1 — Ft). Thus, 

^ is the estimate for Fj with the given statistical uncertainty at la level. 
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In the same way, the yield can be assigned an uncertainty , 3 

Nj- ot / Nj. ot (M tot - N‘ ot ) 

Y = ± y (M tot ) 3 ' 

The la uncertainties are indicated by error bars in the plots. 

5.4.2 Extremal strategies 

We now present the results obtained for the two extremal strategies described 
in Secs. 5.3.2 and 5.3.3 with the following parameters: The distribution of 
the qubits is done through noisy channels and each step of protocol requires 
imperfect two-qubit operations. The noise considered is depolarizing noise 
as defined in Eq. (5.13) with reliability p = 0.9 (10% noise) for the channels 
and pi = 0.99 (1% noise) for the local operations. We used the Monte-Carlo 
simulation method described in Sec. 5.4.1 to reach a precision on the fidelity 
varying from l%o to 1% depending on the size of the states and the number 
of iterations. 

For the MEPP case, one has to decide, which sequence of the sub¬ 
protocols PI and P2 to use. The alternating sequence, P1-P2-P1-..., turns 
out to be not optimal in terms of yield and fidelity, neither for GHZ nor for 
cluster states. To find the optimum, one might hence consider to simulate, 
after each step, both sub-protocols, and then continue with the better one. 
Somewhat surprisingly, this leads to worse results (see Fig. 5.5). Thus, 
to find the optimal sequence of m protocol-steps, one would need to try 
all 2 m possibilities. As this is not practical, we decided to stick with the 
alternating sequence, which turned out, though it is not optimal, to give very 
descent performance. For GHZ states, there is also a difference between the 
alternating sequences P1-P2-P1-... and P2-P1-P2-..., due to the asymmetry 
of the sets Va (containing only Alice's qubit) and Vg (containing the rest). 
Starting with PI works better, and this is what we use in all plots discussed 
in this section. 

GHZ states 

We start with the results obtained for GHZ states. We made our simulations 
for states of three to 10 qubits and a maximum number of steps varying from 
5 to 7. As an example. Fig. 5.6 shows the quantum communication cost as a 
function of desired fidelity for 5-qubit GHZ states. The data points are the 
outputs for 1 to 6 steps of the protocol. This plots allow us to determine, for 
a given fidelity, the strategy which will give the best yield (lowest communi¬ 
cation cost). 

3 To be precise, we should calculate the uncertainty not from Y = 5, but following 
Eq. (5.31). This simplification is however justified, as it only increases the uncertainty esti¬ 
mate, and this only slightly. 
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Figure 5.5: Comparison of the values of inverse of communication cost and 
fidelity obtained after a number of steps varying from 1 to 5 for 3-qubit GHZ 
states. The red solid line stands for the BEPP strategy, the green dashed line 
for the alternating sequence of MEPP subprotocols beginning with PI and 
the blue small dashed line for the alternating sequence beginning with P2. 



Figure 5.6: Inverse of communication cost for different target fidelities for 
5-qubit GHZ states and with p = .9 and p\ = .99 (where p is the reliability 
defined in Eq. (5.14)). The data points are the outputs for 1,2,..., 6 iterations 
of the protocol. The connecting lines are obtained by mixing ensembles of 
different fidelities according to Eq. (5.29). The red solid line gives the ob¬ 
tained value in the MEPP and the green dashed line for the BEPP. The gain 
on fidelity from one step to the other becomes smaller at each step. From 6 to 
7 steps, the gain in fidelity is smaller than the uncertainty both in the BEPP 
and in the MEPP strategy. We consider this value as the maximal reachable 
fidelity. 
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Figure 5.7: Maximal reachable fidelity as function of N for (a) GHZ and (b) 
cluster states for the bipartite (green x) and multipartite (red bars) strate¬ 
gies with reliabilities (cf. Eq. (5.14)) p = 0.9 for channel transmission and 
pi = 0.99 for local operations. The final fidelity is estimated as follows: For a 
given number of parties, we iterated the protocol as long as we obtained an 
increase of fidelity larger than the uncertainty (typically 1%). We took the last 
value as maximal fidelity and assigned its uncertainty to the maximal reach¬ 
able fidelity. The green crosses give the values in the bipartite case while the 
red bars give it for the multipartite case. One sees here the main difference 
in behavior between GHZ and cluster states. In the first case, there is a range 
where the multipartite strategy is better than the bipartite one for a number 
of parties strictly smaller than 10. For more parties, the multipartite protocol 
fails because of the fragility of GHZ states against noise. On the other hand, 
the robustness of cluster states allow us to purify them even for a large num¬ 
ber of parties. The range of fidelity where MEPP is superior increases with 
the number of parties. 


i » I 


Multipartite \ 
Bipartite 


0.98 - * 

0.96 - 
0.94 - 
0.92 - 
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0.88 - 


Multipartite i—i— 
Bipartite x— 


After 6 iterations, the increase in fidelity obtained by an additional step 
is smaller than the chosen precision of 1%. We therefore take this value as 
estimate of the maximum reachable fidelity. A comparison of the maximal 
reachable fidelity for both strategies as a function of the number of parties 
(Fig. 5.7a) shows that the maximal reachable fidelity is higher in the MEPP 
case for a number of parties strictly smaller than 10. In this case, there is a 
transition value of target fidelity from which on the MEPP strategy gives a 
better yield. We will refer to the value pair of fidelity and communication 
cost, where this transition happens, as the cross-over point. Fig. 5.8 presents 
the yield as function of fidelity for N = 3 and N = 10 as well as the cross-over 
points for intermediate number of parties. N = 9 is the highest number of 
qubits for which there is a cross-over point. For higher number of parties, the 
BEPP strategy is always better. This is because of the fragility against noise of 
GHZ states for large particle numbers [DB04, HDB05]. The communication 
cost and fidelity of the cross-over points as function of the number of parties 
are presented in Fig. 5.9a. 
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Figure 5.8: Inverse of communication cost for different target fidelities of 
GHZ states of 3 to 10 qubits with alteration probabilities (as in Fig. 5.7) 
1 — p = 0.1 and 1 — pi = 0.01. The dashed green line stands for 3-qubit 
GHZ states and BEPP strategy, the red solid line for 3-qubit GHZ states and 
MEPP strategy, the blue small-dashed line for 10-qubit GHZ states and BEPP 
strategy, the pink dotted line for 10-qubit GHZ states and MEPP strategy. The 
blue squares give the cross-over points, i.e. the fidelity where MEPP becomes 
more efficient than BEPP, for N = 3,5,6,7,8 and 9. For N = 3 and N = 10, 
the purification curves are plotted as well. For N = 3, they cross at the corre¬ 
sponding blue square. 
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Figure 5.9: Inverse of communication cost (green x) and fidelity (red +) of 
the cross-over depending on the number of parties for (a) GHZ and (b) cluster 
states. This values are obtained by using the Monte Carlo method and are 
therefore submitted to errors. The cross-over indicates the range of target 
fidelity from which up the MEPP strategy is more efficient than the BEPP 
strategy. Note the log scale for the inverse of communication cost. In the 
GHZ case, there is no cross-over for more than 9 parties. 
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Figure 5.10: Inverse of communication cost for different target fidelities for 3 
(red solid line for MEPP and green dashed line for BEPP) and 15 (pink dot¬ 
ted line for MEPP and blue small dashed line for BEPP) qubit cluster states. 
The data points are the outputs for 1, 2, 3, ... iterations of the protocol. The 
intermediate points are obtained by mixing ensembles of different fidelities. 
For more than 6 steps, the difference between the reached fidelity and the 
maximum reachable fidelity is smaller than the uncertainty. For any number 
of parties, the curves representing the two strategies cross over. The disks 
give this cross-over for N = 3,4,5,6,7,8,9,10,15. (That one curve seems to 
"go back" is just an artifact of the statistical inaccuracies of the Monte Carlo 
method.) 



Cluster states 

Next, we did simulations cluster states using the same parameter as for the 
GHZ states. The results are quite different. 

We made our simulations for states of three to fifteen qubits. In this 
range, as one can see in Fig. 5.10, there is always a cross-over point. This 
is in stark contrast with the GHZ case. This main difference in behavior be¬ 
tween this two kind of states is due to the much higher robustness of cluster 
states against noise [HDB05, HDB05]. Moreover, the range of target fidelity 
for which the multipartite strategy is the only one available increases with the 
number of parties as shown in Fig. 5.7b. In Fig. 5.9b, we present the fidelity 
and communication cost of the cross-over point. Both values decrease with 
the number of parties. This is due to the increasing cost of producing bigger 
and bigger states and also to the fact that we consider here the global fidelity 
and not the LNE presented in Sec. 5.2.3. 

5.4.3 Intermediate strategies 

Since switching from BEPP to MEPP can result in such striking differences 
in yield, one might expect that, especially near the break-even point, certain 
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M n prepare an n-qubit cluster or GHZ state 

B2 prepare a Bell pair 

S send the states through the channels 

P1 apply multipartite purification protocol PI 

P2 apply multipartite purification protocol P2 

Pb apply bipartite protocol 

C£ connect £ states to a larger one 


Table 5.1: Legend for the instruction strings in Figs. 5.11 and 5.13. 


intermediate strategies, mixing characteristics of BEPP and MEPP, might per¬ 
form even better. After all, in the BEPP case, one purifies small states (with 
only 2 qubits) and then connects them, while in the MEPP scenario, the states 
are first connected to large units, which are then purified. One can also con¬ 
nect pairs to states of intermediate size, purify these, connect them to the de¬ 
sired full size, and perhaps purify again. This can be seen e. g. in Fig. 5.11. In 
this figure, we have simulated many different strategies which are described 
in short by instruction strings which are processed from left to right and tell 
the software in which order which preparations, transmissions, connections 
or purifications should be simulated (cf. Table 5.1). 

It can be seen that for low fidelities and high yields (left side of the plots), 
the BEPP case is best, as already seen above, and for high fidelities and low 
yields (right margin of the plots), MEPP catches up. In the middle region, 
one may indeed increase the performance by first preparing small states of, 
say, 4, 5 or 7 qubits, purifying them, and then connecting them to the desired 
13-qubit state. (Do not get confused by the appearance of "M13-S" at the 
left margin. This looks like MEPP, but is not, as it contains no purification 
at all. Also note that there is a subtle difference between using the BEPP 
protocol (denoted "B2-S-Pb-... ") and using the MEPP protocol on the | G 2 ) 
state (denoted "M2-S-P1-... " or "M2-S-P2-... "), with the former performing 
better.) 

Of course, only discrete ways of assembling the desired states from equal 
smaller states are available. Recall that connecting L states of n qubits will 
give a state of 

N = Ln — (L — 1) (5.32) 

qubits because (L — 1) qubits have to be measured in the connection process. 
In the plots, we have taken all possible values of L for the given state size n 
and calculated data points for the corresponding strategies with up to four 
purification steps. The blue curve in the plot marks the optimum that can 
be achieved using theses strategies, and mixing them as described in Section 
5.3.4. 

To demonstrate the efficiency of our procedure, we also considered pu- 
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(a) 


local noise equivalent 

0.1 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 



final fidelity 


(b) 


See page 102 for subfigure (b). 


Figure 5.11: Examples for the use of intermediate purification strategies, here 
for 13-qubit (a) cluster and (b) GHZ states. Plotted is the inverse of commu¬ 
nication cost as function of final fidelity. See Table 5.1 for the meaning of the 
instruction strings. The blue curve marks the maximal achievable yield for 
a given desired fidelity and is obtained by connecting the optimal strategies 
with curves according to Eq. (5.28). Noise levels are (1 — q) = 0.1 for the 
channels and (1 — qj) = 0.01 for local operations. In (a), one can -following 
the dark blue line- see well, how for small target fidelity... [cont. on page 102] 
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(a) 


See page 101 for subfigure (a). 


(b) 

0.11 0.1 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 



Figure 5.12: [cont. from page 101] .. .BEPP ("M2-S-...") gives the best yield, 
while for high fidelities (F > 0.9), distributing larger and larger states be¬ 
comes advantageous. (For even higher fidelities, one expects the full MEPP 
strategy, i. e. "M13-S-...", to appear on the blue curve. However, this will 
happen at communication costs larger than the scales shown on the plot, 
which ends with "M13-S-P1-P2-P1", i. e. MEPP with only three purification 
steps.) In (b), the picture is not as clear, at the GHZ states already start to 
deteriorate under the given level of local noise. 
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local noise equivalent 

0.1 0.08 0.06 0.05 0.04 0.03 0.02 0.01 



Figure 5.13: Production of 31-qubit cluster states, using intermediate strate¬ 
gies. The "instruction strings" are explained in Table 5.1. Data points with the 
same number of purification steps are plotted in the same color. Note how the 
distribution of initially larger states becomes advantageous for higher target 
fidelities. Noise levels are (1 — q) = 0.1 for the channels and (1 — q\) = 0.01 
for local operations. (Note also that the the data points at the low end of the 
plot have few purification steps: Those steps beginning with "M16" or "M31" 
that should appear on the curve of optimal strategies are again, as in Fig. 5.11, 
beyond the range of the plot.) 
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rification of cluster states of 31 qubits (Fig. 5.13). In order to allow for easier 
comparison. Fig. 5.13, as well as Fig. 5.11, show a re-gauging of the fidelity 
axis to the so-called local noise equivalent (LNE) described in Sec. 5.2.3. 


5.5 Analytical treatment for a simplified 
model 

For a better understanding of the numerical results, we now develop an an¬ 
alytical treatment for both BEPP and MEPP. To make this task feasible we 
have to restrict ourselves to a simplified noise model. We only consider GHZ 
states. 

As before, we define two sets Va and Vg corresponding to the bi-coloura¬ 
tion of the graph. Va is the set containing only one qubit, namely the central 
vertex which is connected to all the others, and Vg contains the rest. 

In the toy models presented below, the central party, called Alice wants 
to share an N-qubit GHZ state with (N — 1) partners. Depending on the 
strategy, the initial states are either Bell pairs or GHZ states, which are noisy 
due to the transmission through the channels. First, in Subsection 5.5.1, we 
consider local operations to be perfect. We will see that this fails to repro¬ 
duce features seen in the numerical results. Hence, we extend our model, in 
Section 5.5.2, such that it incorporates local noise. 

5.5.1 Perfect local operations 

To start, we assume to local operations to be perfect. Of the channels we 
considered in Eqs. (5.11-5.13), only bit-flip channels and phase-flip channels 
allow for a simple analytical treatment. We present the calculation for phase- 
flip channels. The calculation and the results for bit-flip channels are very 
similar. We have hence not included them in the paper. 

BEPP strategy 

Following the BEPP scenario described in Sec. 5.3.2, Alice sends one qubit of 
each entangled pair p = | G 2 ; 0,0) (G 2 ; 0,0|of her initial ensemble through the 
channel to party %, obtaining 

P = <?|G 2 ;0,0)(G 2 ;0,0| + (1 - (7 )|G 2 ;0,1)(G 2 ;0,1|. 

She then applies the BEPP. The state of the pairs that are kept after one step 
is given by (see Eq. 5.19) 
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In this step, the probability of keeping the source state after the measurement 
of the other state is given by k BEPP (q, 1) = q 2 + (1 — t]) 2 (probability of having 
same measurement outcomes). We denote by / BEPP (^, m) the fidelity after m 
steps. The quantity k(q,m ) is called success probability in step m. Note that 
the ratios of the ensemble sizes after and before the step is given by k(q,m)/2 
as one half of the states are measured and discarded. One obtains the total 
yield y BEPP after m steps by multiplying these ratios for the individual steps. 
By iterating the protocol over m steps one finds: 

2 m 

f BEPP (fl m) = _ - _ 

/ q 2 m q_ ( a — qf 2 "'' 

j bepp ( N r+(i-0"' 

k Dmi (q,m) = - - ----- 

[ (? 2- 1 + ( 1 _ (? ) 2- 1 ] 2 

r BEPP (i),m) = n%T 
1=1 1 

0 +( 1-0 

2-nri 1 (^ + (i-0) 

After the bipartite purification, Alice connects (N — 1) pairs to produce an 
N-qubit GHZ state. To connect two pairs, she applies a controlled phase 
gate (Eq. (5.3)) followed by a a y measurement on one of the two qubits just 
connected (cf. Fig. 5.3). This procedure is repeated (N — 1) times between 
different pairs of parties (A, Bf), k = 1,..., N — 1, in order to obtain the N- 
qubit GHZ state. 

Note that the qubits that Alice connects have not been sent through chan¬ 
nels and are hence unaffected by channel noise. Thus, it does not matter 
whether we first apply the superoperator for the channel noise and then the 
one for the local noise due to the connection process, or vice versa. This means 
that the final state is obtained by applying noise on all qubits of the GHZ state 
that do not belong to Alice. This leads to a fidelity 

F bepp (N, q,m) = f EPP (q r m) N ^ 

and (as the channels are used N — 1 times to create one N-qubit GHZ state) a 
quantum communication cost 

pBEPP _ N — 1 

Y mFF (q,m) ’ 


(5.34) 

(5.35) 

(5.36) 


MEPP strategy 

In the MEPP setting, Alice prepares an N-qubit GHZ state locally and dis¬ 
tributes it through depolarizing channels to her (N — 1) partners. We then 
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have the state 

P {0) = |G*;0,0)(G„;0,0|, (5.37) 

where £% is the phase-flip channel defined in Eq. (5.11). We shall from now 
on suppress the symbol G* which indicates the N-vertices star graph of Fig. 
5.2. 

We shall see that all states that we encounter have the form 


p (m) =r (*)| 0 )(0| + 

.. h ‘i 

, . N i i 

+ r\ m} Yj |0,...010...0)(0,...010...0|+ 

k=2 

'1 '2 '1 '2 

, , N 11 11 

+ r 2 L | 0 ,... 010 ... 010 ...) ( 0 ,... 010 ... 010 ... 1 + 

*1/2=2 

h<h 


+ ---+ r N-ll°'ll...l)<0'H...l 


(5.38) 


where r| ! " ; denotes the coefficient in front of the terms withy entries "1" after 
the mth step of the purification protocol. These states are diagonal in the 
graph state basis and symmetric w. r. t. permutations of the qubits in set Vb- 
They are hence characterized by only N coefficients 0”',..., r^_ v 

We start by carrying out the application of the superoperator in Eq. (5.37). 
Indeed, one obtains a mixture of the form (5.38) with coefficients 

r| 0) = q N ~ 1 ~i (1 - q)i. (5.39) 


As only set Vb is affected by the noise, subprotocol P2 is sufficient to pu¬ 
rify the state. Following [ADB05], one sees that after each step of the subpro¬ 
tocol, the state is changed such that each coefficient becomes proportional to 
the square of its former value, i. e. 



(m—1) 


i 2 


v-^N —1 (N— 1 \ 
o 1 i > 


m—1) 


i 2' 


(5.40) 


Inserting Eq. (5.39), one gets for the first step (using the binomial theorem) 

(l) = q 2{N ~ l -fi(l-q) 2 j 
i [q 2 + (1 _ q) 2f-l' 

and iterating the formula, one finds 

r w = r m ^-i\i-qri 

1 [q 2,n + {1 - q) 2,n ] N ~ r 
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The fidelity of the state at step m can now be read off: 

2 m -| N—l 


F 


-MEPP 


(N.q.m) = A" n = 


0 + 


Note that this is the same expression as we got before for the BEPP: 
F MEFF ( N ,q, m ) = F beff (N, q, m). 

To calculate the yield, we need the success probability k MEFF (N, q, in) that 
a state is kept. Using a similar argument as we did for Eq. (5.40) we find 


( r + (i-g) 2 " ^ 

\[ q 2m - 1 + (1 - q ) 2 " 1 - 1 ] 2 J 

From this, we can find the yield as before 


r,MEPP 


(■ N,q,m) = £ 


i\ — r 
i 


Y 


MEPP 


(■ N,q,m ) =n 


k MEFF (N,q,i) 


f=i 


= T 


<F + (i - <f) : 


[f + (i - <7) 21 ] 

Comparing to the BEPP case, one sees that 

Y MEFF (N,q,m) = 2 m ( N “ 2 ) \Y BEFF (N,q,m)} 


N-l 

(5.41) 


(5.42) 


Conclusion 

In the particular case of dephasing channels and perfect local operations, both 
strategies lead to the same fidelity after iterating the protocol the same num¬ 
ber of steps. However, they differ in the communication cost. As one sees 
from Eq. (5.42), the yield of the MEPP strategy is always lower (and the com¬ 
munication cost hence larger). This fact can be explained from the higher 
probability of throwing away states at each step, which even increases fur¬ 
ther with the number of parties. 

We have also done analytical calculations for bit-flip channels (Eq. (5.11)) 
and numerical simulations for depolarizing channels (Eq. (5.13)) (always in 
case of perfect local operations) and found a similar behavior. 

In order to see regions where MEPP is superior as we did with the Monte 
Carlo simulations, it is hence necessary to give up the simplification of as¬ 
suming noiseless local operations. 

5.5.2 Imperfect local operations 

If local operations are not assumed to be perfect, results are quite different. 
We again consider GHZ states of arbitrary size, to be purified with the BEPP 
or MEPP strategy. 
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For the noise, we define a model which is simple enough to allow for 
analytical calculations but still shows the general features obtained numeri¬ 
cally (especially it shows cross-over points): The channel through which Al¬ 
ice sends qubit to the other parties is the phase-flip channel of Eq. (5.11) with 
alteration probability (1 — q). The imperfection of the local gates are modeled 
by bit-flip noise (Eq. (5.12)), for Alice's operations, and phase-flip noise for all 
other parties, always with alteration probability (1 — qf). 

MEPP strategy 

As before, Alice prepares perfect N-qubit GHZ state and distributes them 
using the channels. We get the same initial state p l(i ' as before, again of the 
form (5.38) with coefficients as in Eq. (5.39). We shall see that, again, the 
form (5.38) is preserved by the purification steps even though they are now 
assumed to be noisy. 

The values of the r, are changed according to a linear map: 



We shall construct this map in two steps. First, we see, how the phase-flip 
noise of the local gates acting on the qubits in Vg, 



(5.43) 


changes the coefficients and denote the map corresponding to this action by 


A: 



k =0 


Then, we consider the action of the bit-flip noise on Alice's qubit get the full 
map A. 

For the first step, we call a state | G*, p) a k- state, if p starts with a 0 (for the 
central qubit in Va) and contains, within the indices corresponding to Vg, k 
entries "1" and (N — 1 — k) entries "0". We can now calculate the probability 
Pj^k that the superoperator (5.43) changes a pure k -state to any /-state: Say, s 
of the k entries "1" are flipped to "0". Then s = / — k + s of the (N — 1 — k) 
entries "0" have to be flipped to "1". Hence, 



N-l-k 


s 




s „N—l—k—s 


(5.44) 


There are { N j 1 ) /-states and ( N k 1 ) Gstates, and so 
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which can also be written in terms of GaufTs hypergeometric function F as 


V - *.(/+ jfc), F ~ k 'i - N + 0 + k + ^ x 

x (1 - qi)’- k qf- iH+k (5.45) 

Now, we can do the second step and apply the noise for the imperfection 
of Alice's local gates, modeled by £% (<?/)• We get, due to Eq. (5.6), 



= mr'j + (1 - < 7 z ) 4 _; 


N-l 


E ^'jtd'kr 

k =0 


where 


0 



/N-l -fc\ 

V / - k + s ) 


^N-l-/+fc-2s+l( 1 _ ?/ y-k+2s 


+ 


+ 


(" 7 -.“*) 


^_l_ ; -- fc+ 2 8 ( 1 _^) /+fc _2 8+ i 


The fidelity and the yield corresponding to one step of protocol can then 
be calculated: 


/(N, q,qi,m ) 



y-N—1 /N—1\ 
Mfc=0 1 k ) 



y-N—1 
Mt=0 



2' 


As before, the denominator of the previous expression is the suc¬ 
cess probability k(N,q,qi,m ) of the step m, 4 and we get the total yield 
yMEPP( N , ^ ^ by multiplying up the factors /c/2 of all m steps: 


yMEPP 


(■ N,q,qi,m ) 


m 


n 


/c(N, q,qi,i ) 
2 


(For the corresponding quantum communication cost, Eq. (5.27) has to be 
used.) 


BEPP strategy 

Next, we find an analytical treatment for the BEPP strategy. In order to facili¬ 
tate the calculation, we will consider the BEPP as a special case of the MEPP. 
We first show why this is possible without changing the results: 

Considering the restricted noise model presented in this Section, the 
state can always be written as a contribution of |G 2 ; 0 , 0 ) (G 2 ; 0 , 0 | and 

4 Note that A is dependent on m although we have suppressed this to keep notation simple. 
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|G2;0,1)(G2;0,1| only. It can hence be purified using only subprotocol P2. 
In addition, the only difference between the BEPP protocol as described 
in Sec. 5.2.4 and subprotocol P2 is the exchange of the states |G2;1,0) 
and | G 2 ; 1,1) in the former, and not in the latter. As these states have no 
contribution in the present model, the two protocols give identical results. 

The results obtained in the last section can be used to calculate the fidelity 
f(q,qi,m) and the yield Y pppp (q, eg, in) before connection. After purification, 
Alice connects qubits a \,... of (N — 1) pairs described by the states 

Pk = q\G 2 ) k (G 2 I + (1 — q)cr z bk ^ \G 2 ) k (G 2 I <70 respectively. The joint state of 
the pairs is given by 


P=? N_1 ^0|G2) fc (G2|^ 

N- 1 / N—l \ 

+ (!-?) fl N “ 2 E a z bi) ( 0 l G 2>ft ( G 21 ) Vz bi) + ... 

i= 1 \k= 1 / 

/ N—l , \ /N—l \ (N—l \ 

+ (i- ( r n# 1 f^iG 2 ) t <G 2 ij iri0- 

The connection is performed using the procedure described in Sec. 5.2.1. As 
the noise contains only cr z operators, it commutes with the projectors and 
correction operators that are used in this procedure. The state after projection 
is given by 

P 2 p=q N ~ 1 \GQ{GQ 

N 

+ (1 -q)q N ~ 2 £vi b) |G*)(G*|cr0 4-E 

b =2 

+(i-«i) n - 1 (n^ 1 ) ig.xg.i 

It follows that after the connection, the fidelity is given by 

F bepp = f(q,q lf mf- 1 

and the quantum communication cost by 

pBEPP _ N — l 

yMEPP( 2 , q l q l , m ) 


Discussion 

The results obtained the way just explained are presented in Figs. 5.14, 5.15, 
and 5.16, always for MEPP and BEPP. 
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The cross-over points, i. e. the fidelity values (and corresponding costs) 
above which MEPP performs better than BEPP, are plotted as blue squares 
in Fig. 5.14 for up to 70 parties. We have also plotted the fidelity-cost func¬ 
tion for selected numbers N of parties, which cross in their respective blue 
squares. The other cross points were determined in the same way by observ¬ 
ing where the MEPP and the BEPP curve intersect. Compare this figure with 
Fig. 5.8: Our analytical toy model could reproduce the appearance of cross 
points which we had already discussed in Section 5.4.2, an essential feature 
observed for the more general noise model. It does not, however, reproduce 
the fact that a cross-over ceases to appear above a certain number (here: 9) of 
parties. This fact is due to the particular kind of noise of our toy model under 
which GHZ states appear less fragile than under depolarizing noise so that 
the break-down of MEPP for large states does not happen. 

We can use our analytic model to explore the parameter space more thor¬ 
oughly. For instance, one might be interested how the positions of the cross 
points change if the local noise is increased. This is shown in Fig. 5.15 where 
the right-most curve is the same as the disks in Fig. 5.14 and the others are for 
higher local noise levels. Observe how the effect of local noise depends more 
and more on the state size as its level approaches the order of magnitude of 
the channel noise. 

The vertical tails of the curves in Fig. 5.14 already allow to easily read 
off the maximum reachable fidelity, which is plotted in Fig. 5.16. There, the 
advantage of MEPP over BEPP increases with the number of parties. This 
effect can also be seen in the numerical calculations for depolarizing noise 
(Fig. 5.7a). In the latter case, it is, however, soon overcome by the competing 
effect of the break-down of MEPP under realistic (depolarizing) noise. 

5.5.3 Testing the numerics 

The analytical formulas are also very useful for verifying the code of our 
numerical calculations. Switching the programs from depolarizing noise to 
the simplified noise considered here is a trivial alteration. We find that the 
numerical results agree well with the analytics, see Fig. 5.17 to 5.18. This fact 
makes us confident in the correctness of our codes. 


5.6 Summary and conclusions 

In this article, we have investigated the quantum communication cost of 
preparing a class of multipartite entangled states with high fidelity. The pres¬ 
ence of noisy quantum channels and imperfect local control operations re¬ 
quires the usage of error correction or -in our case- entanglement purifica¬ 
tion schemes to achieve this aim. We have considered various strategies to 
generate these high-fidelity states and have established in this way upper 
bounds on the quantum communication cost. The optimal strategy strongly 
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Figure 5.14: Inverse of communication cost as function of final fidelity for the 
simplified noise model used in section 5.5.2. Analytical calculation for GHZ 
states of different number of qubits N varying from 5 to 70, with alteration 
probability for the channel and local noise of (1 — q) = 0.1 and (1 — q{) = 0.05 
respectively. The green dashed lines stand for MEPP strategy while the red 
solid lines stand for BEPP strategy. The blue circles give the crossing points 
for all number of parties between 5 and 70. 



Fidelity 


Figure 5.15: Analytical results for different number of parties and different 
amount of local noise. Each curve gives the yield as function of fidelity for 
the cross-over for a given alteration probability (1 — qi) (See Eq. (5.11) and 
(5.12)). This parameter varies from qi = 0.93 (left curve) to qi = 0.99 (right 
curve). 
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Figure 5.16: Maximal reachable fidelity F max plotted against the number of 
parties for the simplified model described in Sec. 5.5.2 applied to GHZ states. 
The alteration probability for the channels and the local noise are given by 
(1 — q) = 0.1 and (1 — q{) = 0.01 respectively. The results were obtained 
analytically. 
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Figure 5.17: Testing the numerics: We switched the programs that were used 
to calculate the results of section 5.4.2 (program C) and section 5.4.3 (program 
S) to the simplified noise model of section 5.5.2. The plot shows the inverse 
of communication cost as function of the final fidelity. The red symbols (+) 
stand for the analytical results while the green (x) and the blue (X) symbols 
stand for the output of program C and program S respectively. The error 
bars stand for 1 j errors. A comparison with the derived analytical formulas 
shows satisfactory agreement. The calculation was done for MEPP of GHZ- 
states with 10 qubits (b) at noise levels q = 0.9 and qi = 0.95. 
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Figure 5.18: Testing the numerics obtained for the BEPP strategy. Noisy en¬ 
tangled pairs, arising from sending one of the qubits through a depolarizing 
channel, are purified using the BEPP. The plot shows the inverse of communi¬ 
cation cost as function of final fidelity. The black crosses give the exact values 
while the red bars give the numerical results of the Monte-Carlo simulation. 


depends on the error parameters for channels and local control operations, 
and on the desired target fidelity. For a simple error model and the gener¬ 
ation of GHZ states based on various strategies, we have obtained analytic 
results that allow us to compare these strategies. Numerical simulations for 
generic error models, based on Monte Carlo simulation, show essentially the 
same features as observed in the simplified model. The simulation makes 
use of a recently developed method that allows one to efficiently simulate 
the evolution of stabilizer states (or graph states) under Clifford operations 
on a classical computer [AB06, AG04], We have also applied this method to 
investigate not only the generation of GHZ states but also of other types of 
multipartite entangled states, e.g. cluster states. 

We find that for high target fidelities, strategies based on multipartite en¬ 
tanglement purification generally perform better than strategies based on bi¬ 
partite purification. For low target fidelities, strategies based on bipartite 
purification have a higher efficiency, leading to smaller communication cost. 

We believe that the generation of high-fidelity multipartite entangled 
states is of significant importance in the context of (distributed) quantum in¬ 
formation processing. Such multipartite entangled states represent resources, 
e. g. for measurement-based quantum computation, conference key agree¬ 
ment and secret sharing schemes, and may be used for other security tasks. 
Our investigation takes both channel noise and noisy apparatus into account. 
We could show that the choice of a proper strategy not only allows one to 
significantly reduce the quantum communication cost, but to reach fidelity of 
target state that are not accessible otherwise. 
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Chapter 6 

Overview on the numerical 
treatment of strongly- 
correlated systems 


Strongly correlated quantum systems are a major challenge to contemporary 
theoretical physics. Whenever a perturbative treatment cannot give satisfy¬ 
ing results, a number of sophisticated numerical techniques may be tried. 
In this chapter, I attempt to give a brief overview of the most common and 
important ones of these. Even though these methods have given rise to im¬ 
pressive successes, many phenomena in solid-state physics still wait for an 
explanation and many researchers feel that exploration of these systems by 
numerical means will turn out to be crucial. Often given examples for this 
are high-T c superconductors and quantum effects in the magnetical prop¬ 
erties of solids. (For an introduction to the research on such systems, see 
e. g. [Aue94].) An important aim in this area of research is to deepen the un¬ 
derstanding of quantum phase transitions, i e., phase transitions that are not 
driven by thermal but by quantum fluctuation and hence occur even at zero 
temperature. (See [Voj03] for a review.) 

Model systems of quantum spins on a lattice are most suitable for such 
studies. Given that magnetism is explained by spins within a crystal lattice, 
these models are a most natural abstraction. Already the simple classical 
XY model 1 (with its critical special case, the Ising model) can become quite 
difficult, even more so when made a quantum system by adding a transver¬ 
sal field. Onsager's analytic solution of the 2D Ising model without field 
[Ons44] is considered an early milestone in the field (that then invalidated, to 
everybody's surprise, most that was thought to be established about phase 
transitions) and seemingly still marks the end of what is possible without 
numerics. One of the most challenging of the models of fundamental interest 
is surely the Heisenberg antiferromagnet, whose ground-state was quickly 

1 See Sec. 8.5.1 for definitions of these models. 
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found to deviate subtly but crucially from the simple Neel order (i. e v neigh¬ 
boring spins are anti-aligned) that one might naively expect and whose low- 
energy behaviour defeats any easy explanation with quasi-particles due to 
the model's criticality Interest in the Heisenberg antiferromagnet only in¬ 
creased with the discovery of cuprate high-T c superconductivity as its origin 
was soon found to be connected to the ground state properties of the Heisen¬ 
berg antiferromagnet [And87], 

Recently, spin systems became relevant even for quantum optics, as it was 
theoretically shown [JBC + 98] and experimentally demonstrated [GME+02] 
that systems of cold atoms trapped in an optical lattice can be tailored to real¬ 
ize the bosonic version of the Hubbard Hamiltonian. This opens the exciting 
possibility to use such setups as a kind of analog computer to simulate spin 
systems. 

Up to now, however, we have to use digital computers, and the following 
section describes, rather briefly, the most important techniques commonly 
used for attacking problems on spin system. In the chapters following this 
one, our work on novel variational techniques for spin systems is described. 
The purpose of the present chapter is hence to give an overview of the state 
of the art in order to be able to rate the new techniques in the context of what 
can already be done with established ones. This overview does not aim to 
convey much more than the basic flavour of the described methods. Instead, 
I aim to provide a survey of introductions and reviews by suggesting selected 
literature. 


6.1 Finite-size and infinite-size techniques 

In statistical and solid-state physics, one is usually interested in the properties 
of very large systems; so large, in fact, that their properties are described by 
the so-called thermodynamic limit of infinite system size. Certain numerical 
techniques work natively in this limit, others estimate the numbers of interest 
for finite systems of different sizes, and then obtain the limit by extrapolation 
(usually done by extrapolating for the reciprocal system size tending to zero). 

The possibility of such an extrapolation is due to the fact that from a cer¬ 
tain order of magnitude for the system size onwards, the influence of the 
microscopic properties changes only quantitatively and slightly. Roughly 
speaking, this means: In a short spin chain, it may make a huge difference 
whether a spin has four or eight neighbors, as near neighbors may have a 
different effect than "very near" neighbours. For a long spin chain, the spins 
far away from a given one will all have roughly the same effect. Whether a 
spin "feels" five hundred or thousand neighbors in far distance can change 
values only slightly and smoothly. This allows to fit e. g. a power law or a 
polynomial and do a reliable extrapolation. The fitting works only, however, 
if the precision of the values obtained for different system sizes is sufficient 
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to capture the possibly small differences between these. If a method allows to 
calculate for, say, one, two, and three hundred spins but with an uncertainty 
above the difference to be expected between these different sizes, any attempt 
to fit and extrapolate is bound to drown in noise. It is hence important to bear 
in mind that the ability of a technique to treat large systems is without value 
without good precision. Worse even, as the expected changes get smaller and 
smaller with system size, ever more precision is needed for larger systems. 

In order to estimate the precision of an approximative method, exact val¬ 
ues for comparisons are useful. These may be provided by exact diagonal- 
isation of very small systems with the Lanczos algorithm [Lan50]. Modern 
implementations (ARPACK [LMSY96], SPINPACK [Sch]) make this feasible 
for up to roughly 25 spin-id sites, which is sometimes even sufficient for a 
first rough estimate of thermodynamic limits. 

6.2 Series expansions 

In many systems, the approach towards critical points can be treated, de¬ 
spite high correlation, by means of series expansions, which become exact in 
the limit of inclusion of infinitely many terms. While a simple Taylor series 
often fails to have a sufficient radius of convergence to reach the parame¬ 
ter regions of interest, more sophisticated expansion techniques known from 
complex analysis, most importantly the Pade expansion [Bak61], have pro¬ 
vided a wealth of results about the precise position of critical points and the 
values of critical exponents for a variety of spin systems. These expansion 
techniques work directly with infinite systems and can hence, wherever they 
are applicable, achieve estimates for thermodynamic limits at much higher 
precision than other techniques. As series expansion schemes are rather dif¬ 
ferent from the ansatzes described in this thesis, I shall not say more about 
them but may direct the interested reader to the comprehensive monograph 
[Bak90]. Furthermore, the very recent result [BDL07] may be of specific inter¬ 
est in the context of the applications discussed here; it discusses a series ex¬ 
pansion technique allowing for perturbative treatment of interaction Hamil¬ 
tonians of arbitrary geometry provided that their strength is in a certain sense 
small compared to the spectral gap of the local Hamiltonian. 

6.3 Real-space renormalization group 

The concept of renormalization has been developed in quantum field the¬ 
ory to deal with the divergence of series expansions into Feynman diagrams 
and became, over several decades of research 2 , a most fundamental method¬ 
ological concept for any problem involving many different scales of length 

2 Of the many researchers who have contributed to renormalization, Wilson ranks Bethe, 
Schwinger, Feynman and Dyson as the most important ones. [Wil75] 
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or energy that cannot clearly be separated. This situation occurs not only in 
the evaluation of loop corrections in quantum electrodynamics, the original 
starting point, but also in many problems of solid state physics, and hence, 
the subject was soon recognized as a collaborate effort of high-energy and 
of solid-state physicists. Wilson's solution [Wil75] of the Kondo problem 3 
is usually considered the most spectacular success in applying renormaliza¬ 
tion to a solid state setting. He introduced there the concept of the real- 
space renormalization group, which uses Kadanoff's idea of spin blocking 
(or: "block decimation") to move the high energy physics concepts from mo¬ 
mentum to position space and then find the scaling relations for the Kondo 
setting. (These scaling relations, postulated by Widom, are the mathematical 
formulation behind the idea which is often called the universality of phase 
transitions.) Kadanoff's spin blocking is a way of coarse graining a lattice 
of spins: Briefly put, we may expect near a phase transition that the physics 
looks similar at all length scales because with the diverging correlation length 
a natural length scale is no longer present. Hence we may coarse-grain the in¬ 
finite lattice by dividing it up into small, equal-sized blocks, each containing 
a few neighboring spins. Every block is then replaced by a single spin whose 
state is derived from the states of the spins in the block according to some 
chosen rule. The rescaled lattice, i. e. the lattice of these blocked spins, is then 
described by a Hamiltonian whose form is typically the same as before but 
whose coefficient have changed according to a relation that has to be derived. 
This renormalization relation is iterated to go to larger and larger blocks until 
it does not change any more. At such a fix-point we may assume, according 
to Kadanoff and Wilson, that all microscopic degrees of freedom have been 
"averaged out" and what is left describes the macroscopic appearance of the 
near-critical system. The fix-point form of the transformation then gives ac¬ 
cess to these properties and the critical exponents of the Widom scaling at 
this critical point. 

This brief account may only give a very rough idea of what real-space 
renormalization is about. The interested reader may find a full exposition fo¬ 
cusing on the Kondo problem in Wilson's review [Wil75], an account of renor¬ 
malisation's history in Wilson's Nobel lecture [Wil93], a modern treatment 
(focusing on the Ising system as example) in Kadanoff's textbook [KadOO] 
and in Fisher's excellent review [Fis98], and a shorter overview focusing on 
computational aspects in Pang's textbook on computational physics [Pan97], 

3 Kondo noticed that certain metals with very dilute ferromagnetic impurities show pecu¬ 
liar behavior at very low temperature, namely the impurities' effect on, e. g., electrical resis¬ 
tance, is much larger than one should expect from the direct coupling of the impurities' spins. 
He conjectured the reason to be that the coupling of the impurity spins to the spins of conduct¬ 
ing electrons becomes dominant below a temperature of the order of IK. The problem got a 
notorious reputation of being extremely hard after many, very different, approaches failed to 
solve Kondo's deceptively simple looking model Hamiltonian for the zero-temperature limit 
of this situation. Wilson got his Nobel prize for demonstrating the power of his real-space 
renormalization group by finally solving the mystery of the Kondo problem. 
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The convoluted history of the field seems to make it especially hard for the 
newcomer to know where to start, and hence I also would like to recom¬ 
mend Domb's monograph [Dom96], which is simultaneously an introduction 
to and a chronicle of renormalization. 


Wilson's original position-space renormalization technique is rarely used 
now. This is because, in startling contrast to the work on the Kondo problem, 
it turned out to fail for many systems of interest. Nevertheless, many aspects 
have been incorporated into other methods. Especially, it gave rise to the 
density-matrix renormalization group (DMRG), to be discussed in the next 
section. 


Another fruitful approach was to merge Wilson renormalization with 
Monte Carlo integration: In the expression for the renormalization of the 
Hamiltonian, an average of the exponentiated inter-block part of Hamilto¬ 
nian, weighted by the Boltzmann factors with respect to the intra-block part, 
has to be performed. For some systems, this average can be approximated 
well by means of a short Taylor expansion. More often, however, this 
Taylor series converges too slowly, and then, it is of huge advantage to 
use the Metropolis algorithm. This technique is known as Monte Carlo 
renormalization group and has been pioneered by Swendsen [Swe79]. As 
Kolb showed for the example of the Ising model with transverse field the 
method can also be used for quantum systems [Kol83]. 


Another modern renormalization algorithm that is based on Kadanoff- 
Wilson position space renormalization is contractor renormalization (CORE) 
[MW94, MW96]. Here, the fact is used that a Hamiltonian "contracts" a trial 
state towards the ground state if e~ tH is applied to it. The blocking scheme 
is now used to find a good approximation to the operator e~ tH (for t smaller 
than some f max ) by projecting onto a set of low lying states within the Hilbert 
space of a block. This gives rise to an effective Hamiltonian, for which the 
real-space renormalization group is employed. The technique was first used 
semi-analytically and recently transformed into a fully numerical method 
[CLM04, SW06]. 


Finally, one should mention that many typical methods from quantum 
field theory owe their applicability to condensed matter theory to the work 
on the renormalization group. A very productive tool is here the so-called 
epsilon expansion, which treats systems with d < 4 dimensions as a per¬ 
turbation of the four-dimensional case (which shows less divergence in path 
integral expansions), expanding around e = 4 — d. (For more information, 
see e. g. [Dom96].) 


0 - 


© 


0 


0 










0 


0 


0 


0 


122 6. Overview on the numerical treatment of strongly-... 

6.4 Density-matrix renormalization group 

6.4.1 Standard DMRG 

Despite the impressive success that Wilson had with applying real-space 
renormalization to the Kondo problem, many subsequent attempt to use the 
method for other problems led to failures. In Ref. [WN92], White and Noack 
analyzed the reason for these failures. Following a suggestion by Wilson, 
they studied the discretised version of the standard textbook problem of a 
single quantum particle in a one-dimensional infinite potential well. They 
demonstrated that the failure of the method is due to the fact that the cur¬ 
rent block is unaware of its surrounding which forces unnatural boundary 
conditions onto the wave function. The solution proposed by White soon af¬ 
terwards [Whi92, Whi93] was to apply renormalization not to wave functions 
but to density matrices, which are seen as result of tracing out the environ¬ 
ment of the block from the pure wave function of the full, possibly infinite, 
system. This "density-matrix renormalization group" (DMRG) has turned 
out to be a very powerful technique to treat one-dimensional spin chains, be¬ 
cause there, it is a natural approach to simply use a copy of the state of the 
block as a preliminary model for the environment. 

The rationale behind White's DMRG is exposed very clearly in his review 
[Whi98], and I hence shall only mention the method's key points: We have 
an infinite chain of d-level systems, governed by a translationally invariant 
Hamiltonian 4 of the form H chain = H^ a ' a+1 >, and wish to approximate the 

expectation value of an observable O for the ground state (or a low-lying ex¬ 
cited state). The method consist of two parts, often called the "warm-up" and 
the "sweeping phase". The first part, also denoted the "infinite size DMRG", 
starts with a four-site chain and builds up a longer and longer chain by insert¬ 
ing pairs of sites in the middle. Once a size has been reached which is con¬ 
sidered sufficiently long, precision can be improved by "sweeping through" 
the chain and making local corrections. 

In a nutshell, the warm-up works as follows: Start with a four-site chain 
(with open boundary conditions) and get its exact ground state by diago¬ 
nalizing the Hamiltonian with the Lanczos algorithm. Now, trace out the 
right half (the "environment") in order to get a reduced density matrix p that 
describes the left half (the "block"). Diagonalize p and discard all but the 
eigenstates |m,-) corresponding to the D largest eigenvalues A,. The block is 
now approximated as p ~ £0^ K \ u i) ( u i I- ' n most cases, the eigenvalues 
of p fall off very quickly so that for sufficiently large D the approximation 
is very good. Now, insert two further spins in the middle. The block and 
the left one of these spins form what is called the "superblock". States in the 
superblock's Hilbert space can now be approximated by projecting into the 

4 One can also treat more general Hamiltonian; especially the requirement for translational 
invariance can be dropped in a straight-forward manner. 
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subspace formed by the tensor product of the truncated block space spanned 
by { | w/) J0Q 1 an d the single-spin space C d . This subspace is isomorphic to 
C ,J 0 C d , and the space of both the superblock and the "super-environment" 
is <C D 0 (2 d 0 C d 0 C D . The Hamiltonian (or more precisely, the Hamilto¬ 
nian truncated by the projection onto the | Uj)) can hence be written as a 
D 2 d 2 x D 2 d 2 matrix by using the transformation matrix formed by the |w,)s' 
expansion in the old basis. A Lanczos diagonalisation of the Hamiltonian 
yields a new ground state, now for the enlarged chain, but still living in the 
manageable D 2 d 2 -dimensional Hilbert space. Again, we trace out the right 
half of the now longer spin to get a new reduced density matrix p, and iterate 
this procedure until the chain is long enough. 

Note that the ground state is always represented as a state in a trun¬ 
cated Hilbert space of the form C D 0 C“ 0 C d 0 C D (or, in the very first step, 
( C d ) 4 ). The outer parts, C D , always encompass the bulk of the chain, each 
of the D basis states representing a, typically highly entangled, state on one 
half (minus one site) of the chain. As the truncations are governed by the 
spectrum of the reduced density matrix, which captures the entanglement 
between the two halves of the chain, the "boundary" condition in the mid¬ 
dle, where new sites are inserted, is typical for sites in the middle of the sys¬ 
tem, not for those at the actual boundary. This is the crucial feature that gives 
rise to the tremendous advantage that DMRG has over Wilson position space 
renormalization, where the state was always extended into the "void" of yet 
untreated length scales. However, the scheme relies on the idea of introduc¬ 
ing single sites along the boundary. While in a chain these are only two sites, 
on a Lx L plane, there would be 2 L sites, blowing up the intermediate Hamil¬ 
tonian to be diagonalized to a D 2 d 2L x D 2 d 2L matrix. Such a huge matrix can 
no longer be written down, let alone diagonalized. This is why DMRG is 
confined to one-dimensional systems. 

In order to find the expectation value of an observable, the observable 
has to be represented by a D 2 d 2 x D 2 d 2 matrix in the same fashion as the 
Hamiltonian, so that in the end it can be trivially applied to the resulting 
state. Note, by the way, that DMRG can also find low lying excited states, 
not just the ground state, by targeting not the lowest but one of the other low 
eigenvectors in the Lanczos diagonalisation. 5 This is a very useful feature 
that we shall miss in the variational methods to be discussed later. 

After one has built up a state representing a chain of sufficient length, 
one may further improve the accuracy by doing a few so-called sweeps. In 
the warm-up one always inserts two sites between the two blocks and then 
absorbs them into the block. In a sweep step, one absorbs only one spin 
into its adjacent block and then, instead of inserting two new sites, separates 
off one site from the other block. The effect is that the chain keeps constant 

5 Remember that Lanczos diagonalisation yields only the lowest and the highest eigenval¬ 
ues, not those in the middle. 
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length but the two individual spins move away from the middle towards the 
border. Each step involves a diagonalisation for the absorption of one spin 
into the growing block, which gives the chance to improve the precision of 
the calculation. This is because in the warm-up, a copy of the block serves as 
a place-holder for the environment to trace over. Now, we trace over a better 
approximation of the environment, and the chain converges very quickly to 
a fix-point, which usually is extremely close (with an energy value precise to 
five or more digits) to the correct state. 

DMRG has been used for a large array of problems: the book [PWKH99] 
and the review [Sch05] give an overview over these diverse applications. 

6.4.2 Improvements 

When done in the style described above, the observables have to be trans¬ 
formed along with the state in order to be available in the truncated basis. 
Alternatively, one may store the operators that carry out the truncation. As 
Ostlund and Rommer realized [OR95] the state's expansion in the product 
basis can then be written in the elegant form 

d -1 

|Y) = a [sil A 2 ' [s2 U 3 ' [s3] ... z1 n_1 '[ s n-i]|3[ s n] | SlS2 ... Sn ), (6.1) 

si,...,sjv=0 

where the D x D matrices A a ’ I s ! contain the transformation that was found 
in the in the states \u) of the previous description. The vectors oJ s l and (0 
reflect the very first basis truncation and account for boundary conditions. 
In the case of periodic boundary condition, the from becomes even simpler, 
namely 


d-i 

| Y) = £ tr (A^A 2 ^ ... A n '[ s ~1 j |sis 2 ... s N ), (6.2) 

Si,... / s J vf=0 

These states are known as "matrix product states" (MPS) or "finitely- 
correlated states" and have already been studied previously by various au¬ 
thors. They turn out to be a generalization of the so-called valence-bond solid 
states (VBS), which are the solution of AKLT Hamiltonian (an exactly solvable 
model for one-dimensional antiferromagnets) [AKLT87], Especially, Fannes 
et al. [FNW92] have determined that such states are capable of describing 
all states of a spin chain that are finitely correlated. (The precise meaning of 
this limitation is discussed later.) This made clear the limitations of DMRG, 
namely that it is incapable of representing spin chains whose correlations fall 
off slower that exponentially. 

Ostlund and Rommer used the MPS form to reformulate DMRG as a vari¬ 
ational method: The general form (6.2) is seen as trial wave function and 
for an infinite translationally invariant chain, the ansatz can be simplified by 
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making the matrices 70 s 1 independent of the spin index a. Minimizing the 
energy with respect to a translationally-invariant Hamiltonian they could re¬ 
produce well known value for the general AKLT Hamiltonian [OR95, R097]. 

This become the starting point for a variety of fruitful improvements and 
generalizations of DMRG. Verstraete et al. [VPC04] revisited the construction 
of matrix product states from the valence bond solid picture and realized 
that this solves an old problem of DMRG, namely its bad performance under 
periodic boundary conditions. Working consequently in the form (6.2) and 
using the eigenvalue minimization also used by Ostlund and Rommer, but 
without making the matrices 70 s 1 independent of a (not even for translation- 
invariant Hamiltonians) gives a performance as good as that of standard 
DMRG with open boundary conditions (surprisingly even though the warm¬ 
up phase was omitted). 

As this ansatz will also be the basis of much of Chapter 9, we shall briefly 
sketch this technique using the formulation employed later: For a matrix 
product state |Y) of the form (6.2) and an observable O which is a tensor 
product of local observables, i. e., 

N 

0 = 00 , 

a=1 


the (unnormalized) expectation value of O can be computed as follows: 


(Y | O | Y) 


d—l ( N 

E n«|0: 
\« =1 

si,...,s N =0 



< N 

tr n0 a 


9 

'°a 


a =1 



trf[70' s “ ) 

. «=i J 


(6.3) 


We may rewrite the matrices A^ a l ,s for a site a as a vector by concatenating the 
columns of the matrices. 


d- 1D—l 

a '* 1 = © © Ah- 

s=0 r=0 


Then, for a fixed site a, the expectation value is a quadratic form in a^, i. e., 

(Y | O | Y) = a w+ 0 [al a w 


with 

6 [ “ ] s , rSr = ... B [fl - 1] )^ (s' | O a | s) . (6.4) 

We now wish to find the optimal value for the matrices A^^ at a certain 
site a such that the energy £(Y) = (Y|H|Y) / (Y|Y) is minimal with respect 
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to the values of the matrices at the other sites b f a being held fixed. To do 
so we have to minimize the quotient 

aH + HW a H 
aM + lMaM ' 

where the tilde on an operator denotes the forming of a matrix according 
to Eq. (6.4). (For this, the Hamiltonian has to be written as a sum of tensor 
product observables, H = ffi 0 each of which forms a matrix, and these 
matrices are summed up again, H^l = YLi 00) 

Such a quotient of two quadratic forms is known as a generalized 
Rayleigh quotient and can be minimized by assigning to the generalized 
eigenvector corresponding to the smallest generalized eigenvalue of the 
generalized eigenvalue problem Ha = Ala. Using the QZ algorithm [MS73] 
(an implementation of which is provided in LAPACK [ABB + 99]), this vector 
can be found efficiently and accurately. 6 

6.4.3 Time evolution 

Various attempts have been undertaken to study dynamics and time evo¬ 
lution with DMRG. When staying within White's formalism, one has to be 
careful to allow for proper adaption of the bases. This mistake was made in 
the attempt of Ref. [CM02] and identified in the comment [LXW03], where a 
resolution was also suggested. Due to the limited space in a PRL comment, 
a proper exposition of the method was not done there but may be found in 
Refs. [MMN05, Man06]. 

Another approach came from the quantum information community: In 
order to contribute to the then very lively debate about what makes a quan¬ 
tum computer powerful, Vidal argued that strong entanglement is crucial, 
because a quantum computation which leaves the entanglement weak can be 
simulated efficiently on a classical computer. Vidal proved this point by giv¬ 
ing an explicit algorithm to do so [Vid03]. Weak entanglement here means 
that if one cuts the quantum register in two chains, then the Schmidt rank of 
the state with respect to this bipartition has to stay bounded by a constant. 
This is because Vidal's representation, which allows to read off the Schmidt 
representation for each such bipartition, is very similar to the matrix prod¬ 
uct state representation and hence has the same limitations on representable 
states. Ref. [Vid04] discusses how the algorithm (now called time-evolving 
block decimation (TEBD)) may be used to simulate the time evolution of a 
spin chain and makes the connection to DMRG theory. Further elaborations 
and applications of TEBD are discussed in Refs. [DKSV04, WF04], 

6 As H and 1 are Hermitean, one may alternatively reduce the problem to an ordinary 
eigenvalue problem by means of a Cholesky factorization of 1 [TB97, ABB + 99]. 
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6.4.4 Generalizations to other geometries 

The restriction of DMRG to chains and rings is a severe limitation, and gen¬ 
eralizations to geometries at higher dimensions were always sought for. A 
simple scheme is to look at ladders, i. e., pseudo-two-dimensional lattices on 
long strips that are only a few sites wide. There, each DMRG spin represents 
the (typically two to four) spins forming one step of the ladder. This is useful 
e. g. in order to study models for frustrated systems. 

Attempts to use the chain-like matrix product states to treat 2D systems 
by letting the spin chain run through the lattice in "serpentines" have been 
not very successful. This is not surprising as the strong correlations between 
nearest neighbors are mediated by the many matrices between a spin in 
one row and the next. Systems with tree-like geometries, however, still al¬ 
low for treatment in this manner (as will be seen in Chapter 9) and this has 
been used successfully to study exciton transport on tree-shaped molecules 
[MRS02, Rod02], 

In Ref. [VC04b], Verstraete and Cirac revisit the valence bond construc¬ 
tion already mentioned above (see also Sec. 5.2.1) and study the entangle¬ 
ment properties of states so gained. This led them to a natural extension 
of matrix product states now called projected entangled-pair states (PEPS) 
[VC04a], These states may be formed to accommodate any lattice geometry 
(or even any graph), have entanglement properties that may keep up with 
the area law (to be discussed in detail in Sec. 9.4), and reduce, when con¬ 
structed for a chain geometry, to the matrix product states. An algorithm for 
evaluation of observables is provided (which, however, needs to do certain 
approximations), and ground states of given system Hamiltonians may be 
found by performing imaginary time evolution with a straight-forward gen¬ 
eralization of TEBD. The recent article [MVC07] show how the method works 
in practice. I shall not go into more details here but postpone the discussion 
to Chapter 9, where we discuss PEPS in a more general context. 

Finally, it is worth mentioning that by means of transfer matrices, spin- 
chain DMRG can also treat 2D classical systems [Nis95], which in turn al¬ 
lows (via the Suzuki mapping, see Sec. 6.5) to study thermal properties of ID 
quantum systems [BXG96]. An alternative ansatz, based on the introduction 
of mixed state analogs to the pure MPSs, is explored in Refs. [VGC04, ZV04]. 
It is superior to the transfer matrix method as it does not require translation 
invariance and works better for finite-size systems and at low temperature. 

6.5 Quantum Monte Carlo 

The idea that randomness can help to numerically solve exact problems more 
efficiently, is not new. 7 Its great potential has been realized by Ulam during 

7 Buffon's needle (1733) [Wei05] may be one of the oldest examples though it is not exactly 
more efficient than a non-randomized calculation. 


0 


0 


0 


0 











0 


0 


0 


0 


128 


6. Overview on the numerical treatment of strongly-... 


the Manhattan project, who coined the term "Monte Carlo integration" for 
the use of the fact that the precision of numerical integration falls of expo¬ 
nentially with the number of function arguments when integrating along a 
regular grid but stays good when sampling a randomly chosen points. The 
idea got boosted by the discovery of the Metropolis algorithm [MRR+53], 
which allows to study thermal equilibrium states of classical systems (and, 
as Hastings realized [Has70], also statistical systems far from physics). This 
is now known as "Metropolis-Hastings algorithm" or "Markov chain Monte 
Carlo". In essence, the Metropolis algorithm is a scheme to sample the phase 
space (or solution space) according to a probability distribution whose nor¬ 
malization cannot be calculated by choosing the sample points with a random 
walk biased with the so-called Metropolis condition (also known as the "de¬ 
tailed balance"). A good introduction to this topic can be found in the classic 
little book of Hammersley and Handscomb [HH64], 

Suzuki's seminal observation that d-dimensional quantum spin systems 
may be mapped onto (d + 1)-dimensional classical systems [Suz76] paved 
the way to quantum Monte Carlo (QMC) [SMK77], The idea of the Suzuki 
mapping is quickly explained: In order to find a value of interest, say, the 
partition function Z = tre^ H of a system of N spins (each with n levels) 
with Hamiltonian H at inverse temperature f, we use the product basis B of 
(C") 0N , and write 

Z = tre~P H = lim (si | (1 — A/3H) L |si). 

L—>oo “L 

A/i=/3/L s iG£> 

Inserting decompositions of the identity before each factor of the Trotter de¬ 
composition, we get 

Z = £ <si I (1 - A/3H) | s 2 ) <S 2 | (1 - A pH) | s 3 ) • • • x 

si,s 2/ ...,s L ei3 

x ... (sl | (1 — A/3H) | si). 

This is now formally identical to the partition function of a classical system 
with N x L spins, which is then treated with the Metropolis algorithm. Using 
a large N brings us towards the thermodynamic limit of large systems, and a 
large L towards the limit of zero temperature. 

Nowadays, the term "quantum Monte Carlo" (QMC) is used for a variety 
of techniques. These include on the one hand techniques for thermal states 
at finite temperature, namely path-integral Monte Carlo (Suzuki et al.'s ini¬ 
tial method), world-line Monte Carlo and stochastic series expansion (refine¬ 
ments of it), and on the other hand techniques to find the T = 0 ground state, 
namely variational Monte Carlo and Green's function Monte Carlo. Vari¬ 
ational Monte Carlo is a variational method using "classical" Monte Carlo 
integration to evaluate the energy of a given parametrized trial wave func¬ 
tion. We shall briefly describe it in Sec. 6.6. Green's function Monte Carlo 
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uses imaginary time evolution to approximate a ground state. It achieves an 
astounding accuracy at quite modest computational effort: Already in 1989, 
the ground state properties of the Heisenberg antiferromagnet were calcu¬ 
lated, reaching a precision of more than 3 digits for the energy at the ther¬ 
modynamic limit, with the calculation time for a 12 x 12 lattice being mere 3 
hours on a supercomputer 8 [TC89]. Three more digits of precision, in the en¬ 
ergy as well as in other quantities, has been achieved only shortly afterwards 
[Run92], 

A current overview on the use of Monte Carlo techniques throughout 
physics may be found in the book [LB05], which, however, restricts its treat¬ 
ment of QMC to only a brief one-chapter overview over the just mentioned 
terms. Of these techniques, the modern variants of path-integral Monte Carlo 
currently seem to enjoy most interest of the community. This is mainly due 
to the spectacular overcoming of the so-called "critical slowing down" prob¬ 
lem, which prevented the application of path-integral Monte Carlo to critical 
or near-critical systems due to exponentially increasing convergence time. 
(Also, these methods now reach very low temperatures, allowing them to 
outperform dedicated T = 0 techniques in ground state studies.) The prob¬ 
lem of critical slowing down was overcome by adapting the solution for the 
corresponding problem in classical Markov chain Monte Carlo (the so-called 
cluster-flipping scheme [SW87]) by two clever constructions known as the 
loop algorithm [ELM93] and the worm algorithm [PST98]. A review of these 
inventions and their applications may be found in [KH04], (Unfortunately, 
Ref. [KH04] assumes the reader's familiarity with path-integral QMC as is 
was done before the loop algorithm, which may be obtained from the older 
reviews [RL85] or [RL92].) Alternatively, a reader in hurry may find an ac¬ 
cessible and qualified, but quite short introduction, in part II of the thesis 
[Wefi05]. 9 The book [NU99] may also provide an overview on the variety of 
modern quantum Monte Carlo schemes. 

The ALPS project [ADG + 05, AAC + 07] is a recent undertaking of offering 
—in a consistent framework— modern implementations for various path- 
integral Monte Carlo techniques (including the worm, loop, or directed-loop 
algorithm), as well as some non-Monte-Carlo schemes. This software pack¬ 
age allows also the non-expert physicist to experiment with these techniques. 
The size of systems treatable with path-integral Monte Carlo is impressive, as 
can be seen from some examples: Ref. [WATB04] studies the co-existence of 
superfluid and Mott phases in a boson Hubbard gas in a optical lattice of 
50 x 50 sites with inhomogeneous trapping potential, and achieves excellent 

8 Note that the supercomputer that they used then achieved less than a fifth of the perfor¬ 
mance of a standard PC of today. 

9 These references chiefly treat the case of systems defined on lattices, which is what we 
are concerned with in this part of the thesis. Quantum Monte Carlo techniques for continu¬ 
ous systems —most successfully used for liquid helium near the superfluidity transition— is 
reviewed e. g. in [Cep95]. 


0 


0 


0 


0 











0 


0 


0 


0 


130 


6. Overview on the numerical treatment of strongly-... 


precision. The 2D Heisenberg antiferromagnet, already mentioned earlier as 
a standard example for a difficult spin system, has been treated for up to 
200 x 200 spins with a precision suitable to get robust and precise extrapola¬ 
tions to the thermodynamic limit [CRT' 03]. 

QMC simulations of dynamical evolution in time have not yet progressed 
as far, but there are algorithms such as, e. g., those used in Refs. [EM94, OP05], 
which seem promising. 

The major draw-back of all Monte Carlo techniques is the so-called sign 
problem: The precision of Monte Carlo integration breaks down dramatically 
if the integrand cannot be written with a positive probability density to gov¬ 
ern the Metropolis walk. In the case of quantum systems, this means that 
due to their non-vanishing anticommutator, fermionic systems can only be 
treated in special cases, and the treatment of frustrated systems is severely 
hindered as well. This renders path-integral Monte Carlo calculations for 
such systems almost impossible and variational Monte Carlo calculation are 
much less accurate as otherwise. (See [TW05] for a recent analysis of the diffi¬ 
culties in overcoming this old problem.) For an example on how far one gets 
despite this problem, see, e. g., newer work on the Heisenberg antiferromag¬ 
net on frustrated lattices (e. g., [CTS00]) or on the fermionic Hubbard model 
(e.g.,[FKK06]). 


6.6 Variational methods 

The variational method is not so much a method in its own right, but rather 
a common theme which we find in many of the numerical techniques (and 
hence we have already come across it above several times in this chapter). 
The basic idea is to use a "trial wave function" (or "test function") which de¬ 
pends on a set of K real parameters, i. e. is a map from 10 into the Hilbert 
space of the system to be studied. As K is much smaller than the dimension 
of the Hilbert space (necessarily, as otherwise it would be too expensive to 
determine all K parameters), the manifold mapped out by these states is only 
a very small part of the Hilbert space. Only a good choice of the map allows 
to find a state that is close to the real ground state. To this end, the energy ex¬ 
pectation value is written as a function of the K parameters, and this function 
is minimized. In Sec. 8.2, we shall come back to the question under which 
conditions such a programme can succeed. 

As the real ground state usually cannot be expected to lie in the mani¬ 
fold of trial wave functions, one is bound to always overestimate the energy, 
leading to an unavoidable source of systematic error. This is in contrast to 
methods such as path-integral Monte Carlo, which are exact in the limit of 
infinite sample size and a controlled approximation, i. e., the uncertainty of 
the result can be estimated reliably. 

Many of the successful choices for trial wave functions are based on 
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the Jastrow wave function [Jas55]. Further common choices, both ulti¬ 
mately based on Jastrow's work, are the Gutzwiller approximation [Gut63] 
(reviewed in [Vol84]) and the resonating valence-bond states (see, e. g., 
[LDA88]). 

For simple cases, the expression for the energy as function of the trial state 
parameters can be minimized analytically or with exact numerics. Matrix 
product states and their variants discussed in Sec. 6.4 are instance of rather 
sophisticated trial functions that still allow an exact evaluation, or a good 
approximation (in case of PEPS states) of their energy. Another possibility is 
the variational Monte Carlo (VMC) technique. There, the energy is an integral 
or sum over the basis set or configuration space and is evaluated with the 
Metropolis algorithm. This was first demonstrated in a calculation for liquid 
helium [McM65], and later shown to give good results even for fermionic 
systems [CCK77, YS87], 

A very recent idea, dubbed "string states", is to combine these ap¬ 
proaches: VMC is not used, as usual, with Jastrow-type or similar wave 
function, but rather with variants of matrix product states augmented such 
that they have entanglement properties favorable for 2D or 3D geometries 
for the price that an evaluation of their energy expectation value is only 
possible with Metropolis sampling [SWVC07b] (see also [SV07]). In this 
work, a clever strategy for sample re-use is devised that allows to still use 
the genrealized-eigenvalue technique as described in Sec. 6.4.2 with its good 
convergence properties. 

From the viewpoint of quantum information theory, an interesting alter¬ 
native way to look at variation emerges, suggested in [DEO07]: Instead of 
minimizing within the manifold of variational states, one looks at the uni¬ 
tary operations on the Hilbert space that can map a fixed trivial initial state 
to a trial state. One chooses a class of unitaries that may be decomposed into 
a quantum-computational circuit consisting of quantum gates, and then uses 
knowledge from the theory of quantum computation and also of optimal con¬ 
trol theory to find a good algorithm to optimize the circuit in order to yield a 
state of minimal energy. 

In the following two chapters we present and explore a novel class of 
variational states, which are based on a generalization of the graph states 
that we have used in Part I. This generalization, called weighted graph states, 
is based on the application of commuting unitary operators on all pairs of 
spins of a product state, and the phases in these operators, as well as the 
parametrization of the product state, is the set of parameters that is varied to 
minimize the energy. 
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Abstract 

We introduce a variational method for the approximation of ground states 
of strongly interacting spin systems in arbitrary geometries and spatial di¬ 
mensions. The approach is based on weighted graph states and superpo¬ 
sitions thereof. These states allow for the efficient computation of all lo¬ 
cal observables (e. g. energy) and include states with diverging correlation 
length and unbounded multi-particle entanglement. As a demonstration we 
apply our approach to the Ising model on ID, 2D and 3D square-lattices. We 
also present generalizations to higher spins and continuous-variable systems, 
which allows for the investigation of lattice field theories. 

1 This publication and the paper of Chapter 8 are reprinted in chronological order. The 
reader may, however, find the matter more accessible if (s)he reads Chapter 8 before the 
present one. (This one, being in letter format, is very dense, while the other one explains 
the same subject in a more comprehensive and extended style.) 
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[PACS: 03.67.Mn, 02.70.-c, 75.40.Mg, 75.10.Jm] 


7.1 Introduction 

Strongly correlated quantum systems are of central interest in several areas 
of physics. Exotic materials such as high-T c superconductors and quantum 
magnets exhibit their remarkable properties due to strong quantum correla¬ 
tions, and experimental breakthroughs with e.g. atomic gases in optical lat¬ 
tices provide a perfect playground for probing strongly correlated quantum 
systems. The main obstacle in understanding the behavior of those quan¬ 
tum systems is the difficulty in simulating the effective Hamiltonians that 
describe their properties. In most cases, the strong correlations in the expo¬ 
nentially large Hilbert space render an exact solution infeasible, and attack¬ 
ing the problem by numerical means requires sophisticated techniques such 
as quantum Monte Carlo (QMC) methods or the density matrix renormaliza¬ 
tion group (DMRG) approach [Whi92, Whi93, Sch05]. 

QMC methods suffer from the sign problem which makes them inap¬ 
propriate for the description of fermionic and frustrated quantum systems. 
DMRG is a variational approach that provides approximations to ground 
states, thermal states and dynamics of many-body systems. Recent insight 
from entanglement theory have lead to an improved understanding of both 
the success and the limitations of this approach. Indeed, the accuracy of the 
method is closely linked to the amount of entanglement in the approximated 
states [Vid03, Vid04, VPC04], Matrix product states [FNW92], which provide 
the structure underlying DMRG, are essentially one-dimensional and the en¬ 
tanglement entropy of these states is limited by the dimension D of the matri¬ 
ces, which in turn is directly linked to the computational cost [Vid03, Vid04, 
Sch05]. Hence a successful treatment of systems with bounded entanglement, 
e.g. one-dimensional, non-critical spin systems with short range interactions, 
is possible, while the method is inefficient for systems with an unbounded 
amount of entanglement, e.g. critical systems and systems in two or more di¬ 
mensions. Promising generalizations that can deal with higher dimensional 
systems have been reported recently [VC04a, Vid05]. However, the compu¬ 
tational effort and complexity increases with the dimension of the system. In 
addition, the amount of block-wise entanglement of the states used in Ref. 
[VC04a] still scales proportional at most to the surface of a block of spins, 
whereas in general a scaling in proportion to the volume of the block is pos¬ 
sible. Such a scaling can in fact be observed for disordered systems [CHDB05] 
or systems with long-range interactions [DHH 05]. 

Here we introduce a new variational method using states with intrinsic 
long-range entanglement and no bias towards a geometry to overcome these 
limitations. We first illustrate our methods for spin-1/2 systems, and then 
generalize them to arbitrary spins and infinite dimensional systems such as 
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harmonic oscillators. In finite dimensions, the method is based on a certain 
class of multiparticle-entangled spin states, weighted graph states (WGS) and 
superpositions thereof. WGS are a 0(N 2 ) parameter family of N-spin states 
with the following properties: (i) they form an (overcomplete) basis, i.e. any 
state can be represented as a superpositions of WGS; (ii) one can efficiently 
calculate expectation values of any localized observable, including energy, 
for any WGS; (iii) they correspond to weighted graphs which are indepen¬ 
dent of the geometry and hence adaptable to arbitrary geometries and spa¬ 
tial dimensions; (iv) the amount of entanglement contained in WGS may be 
arbitrarily high, in the sense that the entanglement between any block of Na 
particles and the remaining system may be 0(Na) arid the correlation length 
may diverge. 

Note that (iii) and (iv) are key properties in which this approach differs 
from DMRG and its generalizations and which suggest a potential for en¬ 
hanced performance at least in certain situations, while (ii) is necessary to 
efficiently perform variations over this family. In the following we will out¬ 
line how we use superpositions of a small number of WGS as variational 
ansatz states to find approximations to ground states of strongly interacting 
spin systems in arbitrary spatial dimension. 

7.2 Properties of WGS 

WGS are defined as states of N spin-1/2 (or qubits), that result from apply¬ 
ing phase gates U a b{(p a j,) = diag(l, 1,1, e ~ 1<Pab ) onto each pair of qubits a, b E 
{1,2,...,N} of a tensor product of <j x -eigenstates |+) = (|0) + 11))/ y/2, fol¬ 
lowed by a single-qubit filtering operation D a = diag(l, e da ), d a G C and a 
general unitary operation U a 

N N 

\VTA,u)«Tl U " D « El Uab(<Pab) |+>®* (7-1) 

a—1 b=a+l 

The phases (p a \, can be associated with a weighted graph with a real sym¬ 
metric adjacency matrix 1 /,;, = cp a i,. For convenience, we define a deforma¬ 
tion vector d = id\,d 2 , ■ ■ ■, £?,v) and U = (g) a U a . The deformations make 
WGS as used in this letter slightly more general than the WGS used in Refs. 
[CHDB05, DHH 1 05] where d a = 0. One can conveniently rewrite as 

\^TA,u) ocU^e- isTls/2+iTs \s), (7.2) 

S 

where the sum runs over all computational basis states, which are labelled 
with the binary vector s = (si,S 2 , • •. ,s0 T . Our class of variation states com¬ 
prises superpositions of WGS of the form 

m 

|Y>«X>| Y r/d(i) 0, (7.3) 

i =1 
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i. e. the superposed states differ only in their deformation vector d (! h while 
the adjacency matrix T and the unitary U are fixed. Such a state is specified 
by N(N — l)/2 + 3 N + 2 (N + 1 )m = 0(N 2 ) real parameters. 

We now proceed to verify the properties set out in the introduction. For 
property (i), observe that for any fixed F and U, all possible combinations of 
D a E {c0, } lead to an orthonormal basis (note that cr~ n \ commute 

with U ab ). Hence any state |Y) can be written in the form Eq. (7.3) for suffi¬ 
ciently large m < 2 n , which shows the exhaustiveness of the description. 

The relevance of employing deformations lies in the observation that only 
| Y) of the form of Eq. (7.3) permit the efficient evaluation of the expectation 
values of localized observables A, i.e. satisfy property (ii). For simplicity we 
restrict our attention to observables of the form 

A='£A ab + '£A a , (7.4) 

a<b a 

where A ab has support on the two spins a,b. The method can be easily 
adopted to any observable that is a sum of terms with bounded support. 
To compute tr(A|Y)(Y|) = £ a<b tr(A ab p nb ) + E„tr {A a p a ) it is sufficient to 
determine the reduced density operators p ab and p a . 

For a single WGS (m = l)we obtain pn = (U\ ® U 2 ) (E r s,t\s) (t\) (U\ ® 
U 2 ) f with 

N 

r 9 t = f0 n (l + ^+ d WW e=1 (s e -f e ) r \ (7. 5 ) 

c =3 v 

and 7 = Ea,b=i r «b ( s « s b ~ t a t b ) + + ^t a ). This generalizes the for¬ 

mula for WGS without deformation obtained in Ref. [DHH + 05]. Eq. (7.5) 
demonstrates that for any WGS, the reduced density operator of two (and 
one) spins can be calculated with a number of operations that is linear in the 
system size N, as opposed to an exponential cost for a general state. 

A straight-forward generalization of Eq. (7.5) allows one to calculate two- 
qubit reduced density matrices for superpositions of the form of Eq. (7.3) 
in time 0(m 2 N). Therefore the expectation value of an observable A of the 
form of Eq. (7.4) with K terms requires 0(m 2 KN) steps. This implies that 
even for Hamiltonians where all spins interact pairwise (and randomly), i.e. 
K = N(N — l)/2, the expectation value of the energy for our ansatz states 
can be obtained in 0(m 2 N 3 ) steps. For short-range interaction Hamiltoni¬ 
ans, this reduces to 0(m 2 N 2 ). The total number of parameters (and mem¬ 
ory cost) scales as 0(N 2 + mN), which can be further reduced by employing 
symmetries. 

The adjacency matrix T, containing the interaction phases cp ab , reflects the 
entanglement properties and the geometry of the system. For instance, a state 
corresponding to a linear cluster state [BR01], will have only P a ,a+i / 0, while 
T a ,a+l / 0 would correspond to longer-ranged correlations. Different values 
of (p ab lead to very different (entanglement) properties: For cp nb = |x fl — x b \~P, 
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where x a denotes the spatial coordinates of spin a, one obtains states with 
diverging correlation length for two-point correlations, while block-wise en¬ 
tanglement can either be bounded or grow unboundedly, depending on the 
value of /3 [DHH + 05]. As I may have arbitrary structure it can reflect also 
complicated geometries on lattices in higher spatial dimensions. 


7.3 Variational method 


Any state of the form Eq. (7.3) with m = poly(N) permits the efficient cal¬ 
culation of expectation values of any two-body Hamiltonian H. A good ap¬ 
proximation to the ground state is then obtained by numerical optimization 
of the parameters characterizing the state, i.e. the 0(N 2 + Nm ) real numbers 
describing T, U, ttj, and d 1 ' ; . Starting from random values, one descends 
to the nearest energy minimum using a general local minimizer (we used L- 
BFGS [BLN95]). Another approach that we found to work well is to keep all 
parameters fixed except for either those corresponding to (i) one local uni¬ 
tary U a , (ii) one phase gate Uabityab) or (iii) the deformation vector d'p for 
one site a. In each case, the energy as a function of this subset of parameters 
turns out to be a quotient of quadratic forms, which can be optimized using 
the generalized-eigenvalue (Rayleigh) method. A similar result holds for the 
superposition coefficients a.j. One then optimizes with respect to these sub¬ 
sets of parameters in turns until convergence is achieved. If one increases m 
stepwise, one —somewhat surprisingly— does not get stuck in local minima. 

A significant reduction of the number of parameters and the computa¬ 
tional costs may be achieved by exploiting symmetries, or by adapting T to 
reflect the geometrical situation. For instance, for systems with short range 
interactions and finite correlation length, one might restrict the range of the 
weighted graph, i.e. I’,,/, = 0 if |x fl — X/, j > r () . This reduces the number of pa¬ 
rameters describing the WGS from 0(N 2 ) to O(N). For translationally invari¬ 
ant Hamiltonians, a better scheme is to let T n i, depend only on |x„ — X/, | . This 
reduces the number of parameters to O(N) as well, and it seems to hardly 
affect the accuracy of the ground state approximation. Hence, it allows one 
to reach high numbers of spins N and thus to study also 2D and 3D systems 
of significant size. Trading accuracy for high speed one may even use a fully 
translation-invariant ansatz, where also D„ and U„ are constant and inde¬ 
pendent of a. In the latter case, for Hamiltonians with only nearest-neighbor 
interactions, the expectation value of the energy can be obtained by calculat¬ 
ing only a single reduced density operator, and the computational cost to treat 
2D [and 3D] systems of size N = L 2 [N = L 3 ] turns out to be of O(L) rather 
than O(N). 
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7.4 Demonstration: The lsing model 

Our method allows us to determine, with only moderate computational cost, 
an upper bound on the ground state energy of a strongly interacting system 
of arbitrary geometry. Together with the Anderson lower bound, one can 
hence obtain a quite narrow interval for the ground state energy and observe 
qualitative features of the ground state. 2 To illustrate our method, we have 
applied it to the lsing model in ID, 2D and 3D with periodic boundary con¬ 
ditions, described by the Hamiltonian 

(7.6) 

{ a,b) a 


where (a,b) denotes nearest neighbors. For a spin chain with N = 20, and 
a 2D lattice of size 4x4 we compared our numerical ground state approxi¬ 
mation with exact results (Fig. 7.1a). We have also performed calculations for 
larger 2D systems up to 14 x 14. We note that the accuracy can be further im¬ 
proved by increasing m (see Fig. 7.1b). In fact our numerical results suggest 
an exponential improvement with m. We have also tested the fully translation 
invariant ansatz with distance dependent phases, constant d a and alternating 
U a for ID, 2D and 3D systems of size N = 30, N = 900 and N = 27000 respec¬ 
tively (see Fig. 7.2). There, for lack of a reference value for the exact ground 
state, we compare with the Anderson bound obtained by calculating the ex¬ 
act ground state energy E A for system size N = 15,3 2 ,2 3 respectively. In the 
2D and especially the 3D case it is not expected that the Anderson bound 
is particularly tight and may lead to a significantly underestimation of the 
precisions achieved by our approach. The states approximated with this sim¬ 
ple ansatz also show qualitatively essential features of the exact ground state. 
As an example, the maximal two-point correlation function Qmi0 (where the 
two point correlation functions are defined as = (c 0 ) — (crjf' 1 ) (ap)) 

is plotted against the magnetic field B in Fig. 7.2b. Strong indication for the 
occurrence of a phase transition can be observed: the correlations signifi¬ 
cantly increase around B ~ 1.1,3.12,5.22 in ID, 2D, 3D respectively. This 
is in good agreement with estimates employing sophisticated power series 
expansions for the infinite systems or Pade approximants based on large scale 
numerical simulations, which expect the critical points at B = 1,3.04,5.14 
[HHO90, WOH94], We also remark that the approximated states show a scal¬ 
ing of block-wise entanglement proportional to the surface of the block, i. e. 
$n a ~ jSBh dim_1 , where fis is some constant depending on magnetic field B, 
N A = L dim and dim is the spatial dimension. We can estimate f /-; and find 
that it significantly increases near the critical point. 

2 We remark that for ID systems, the accuracies appear to scale less well in the resources 

as for DMRG methods. However, our approach yields accurate results also for 2D and 3D 
systems. 
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number m of superposed states 


Figure 7.1: (Color online.) (a) Relative deviation from exact ground state en¬ 
ergy for Ising chain with N = 20 (blue) and 4x4 2D lattice (green) with 
periodic boundary conditions as function of magnetic field B (calculated us¬ 
ing BFGS minimization with summarised phases, m < 6). (b) ID Ising chain 
with N = 20. Improvement of relative deviation from ground state energy as 
function of number of superposed states m for various field values B (calcu¬ 
lated using Rayleigh minimization without summarised phases). 




B/dim 


Figure 7.2: (Color online.) Ising model in ID (blue) with N = 30, 2D (green) 
with N = 30 x 30 = 900 and 3D (red) with N = 30 x 30 x 30 = 27000 spins 
arranged as chain, square, and cubic lattice, respectively, for fully symmetric 
ansatz states with (p a f, = f(\x a — x h \), d a = 1 as function of magnetic field 
B / dim, where dim is dimension of lattice, (a) Relative deviation of ground 
state energy (Emf ~ E) /Emf per bond from to mean field approximation Emf 
(solid), and of Anderson bound ( Emf — E A ) / Emf (dashed). Translational 
invariance is reduced by using U\ f U 2 (alternating), (b) maximal two-point 
correlation Qf" ax 1 for nearest neighbors. 
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7.5 Generalizations 

Our approach can be adapted directly to spin-| systems using the represen¬ 
tation Eq. (7.2). There the sum over binary vectors s with s; = 0,1 has to 
be changed to n- ary vectors s with s, = 0,1,...,ft — 1 and the corresponding 
matrices/vectors T, d, U have to be modified accordingly. However, the limit 
ft —> oo to infinite dimensional systems is both problematic and impractical, 
as the computational effort increases with ft. For continuous variable systems 
we thus choose a closely related but slightly different approach. 

The description of field theories on lattices generally leads to infinite¬ 
dimensional subsystems such as harmonic oscillators. A Klein-Gordon field 
on a lattice for example possesses a Hamiltonian quadratic in position and 
momentum operators X and P whose ground state is Gaussian [AEPW02, 
PEDC05]. This suggests that techniques from the theory of Gaussian state en¬ 
tanglement (see [EP03] for more details) provide the most natural setting for 
these problems. To this end, consider N harmonic oscillators and the vector, 
R = (Ri,..., R. 2 n) T = (Xi,Pi,...,X jv,Pn) T - The canonical commutation rela¬ 
tions then take the form [Ry, R*-] = iu yjt with the symplectic matrix a. All in¬ 
formation contained in a quantum state p can then be expressed equivalently 
in terms of the characteristic function Xp(V) = tr[pW(£)] where £ G R 2N and 
W(£) = exp(z£ T trR). Then, expectation values of polynomials of X and P can 
be obtained as derivatives of X- For Gaussian states, i.e. states whose char¬ 
acteristic function is a Gaussian Xp(£) = Xp( 0 )e _ 4 ? 7?+ D 1 1 where here 7 is a 
2 N x 2N-matrix and D G Pi 2:V is a vector, these expectation values can be ex¬ 
pressed efficiently as polynomials in 7 and D. On the level of wave functions 
a pure Gaussian state is given by \F, G;a) = C [ rN d N xe~z x 7 © _ ' G ) x+aTx |x) 
where F and G are real symmetric matrices, a is a vector, C is the normaliza¬ 
tion and 



(7.7) 


Now, we may consider coherent superpositions \ip) = &,\G U F,; a/) to ob¬ 

tain refined approximations of a ground state. These do not possess a Gaus¬ 
sian characteristic function but a lengthy yet straightforward computation 
reveals that the corresponding characteristic function X\ip)(ip\ (?) is a sum of 
Gaussian functions with complex weights. Then it is immediately evident 
that in this description we retain the ability of efficient evaluation of all ex¬ 
pectation values of polynomials in X and P. This allows one to establish an 
efficient algorithm for the approximation of ground state properties of lattice 
Hamiltonians that are polynomial in X and P. 
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7.6 Summary and Outlook 

We have introduced a new variational method based on deformed weighted 
graph states to determine approximations to ground states of strongly inter¬ 
acting spin systems. The possibility to compute expectation values of local 
observables efficiently, together with entanglement features similar to those 
found in critical systems, make these states promising candidates to approx¬ 
imate essential features of ground states for systems with short range inter¬ 
actions in arbitrary geometries and spatial dimensions. One can also gener¬ 
alize this approach to describe the dynamics of such systems, systems with 
long range interactions, disordered systems, dissipative systems, systems at 
finite temperature and with infinite dimensional constituents. In fact, gener¬ 
alizations of our method that deal with these issues are possible and will be 
reported elsewhere. 

We thank J. I. Cirac for valuable discussions and J. Eisert for suggest¬ 
ing the use of the Anderson bound. This work was supported by the 
FWF, the QIP-IRC funded by EPSRC (GR/S82176/0), the European Union 
(QUPRODIS, OLAQUI, SCALA, QAP), the DFG, the Leverhulme Trust (F/07 
058/U), the Royal Society, and the OAW through project APART (W. D.). 
Some of the calculations have been carried using facilities of the University 
of Innsbruck's Konsortium Hochleistungsrechnen. 
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Abstract 

In a recent article [Phys. Rev. Lett. 97 (2006), 107206], we have presented a 
class of states which is suitable as a variational set to find ground states in 
spin systems of arbitrary spatial dimension and with long-range entangle¬ 
ment. Here, we continue the exposition of our technique, extend from spin 
1/2 to higher spins and use the boson Hubbard model as a non-trivial exam¬ 
ple to demonstrate our scheme. 

[PACS: 02.70.C, 05.30.Jp, 03.67.Mn, 75.10.Jm, 75.40.Mg] 


8.1 Introduction 

Spins or harmonic oscillators on a lattice form a class of models which have 
been studied intensively in statistical physics. Understanding them is the 
key to many problems in condensed matter systems, especially regarding 
magnetic phenomena but also electrical and heat conduction and many other 
aspects. As the importance of quantum phase transitions [Sac99, Voj03] has 
been more and more realized, interest in the ground states of quantum spin 
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models grew. While the relevance of entanglement for quantum phase tran¬ 
sitions was initially not fully appreciated, it is now a vivid area of research 
(e. g., [OAFF02, VLRK03]), and many researchers feel that paying explicit 
attention to entanglement features is vital for further progress in numeri¬ 
cal methods for the treatment of spin models [ON02, VPC04, Lat07], Al¬ 
though quantum phase transitions nominally only occur at zero temperature, 
their presence has great influence on the system properties at finite tempera¬ 
ture, namely leading to the break-down of quasiparticle descriptions. Hence, 
studying the ground state of spin models holds promises to understand ex¬ 
perimentally observed features of such systems, not the least of which is 
high-temperature superconductivity. Finally, spin models (including bosons 
on a lattice) are an ideal way to model optical lattices, which are currently 
researched with exciting successes in theory and experiment (reviewed in 
[LSA+07]). 

While there are some exactly solvable spin models in one spatial 
dimension [Tak99], for nearly all models in higher dimensions approxi¬ 
mative techniques have to be used. A variety of quite different techniques 
have been developed: Most prominently, these are quantum Monte Carlo 
techniques, where recent progress has been achieved especially in the 
context of the so-called world-line Monte Carlo methods ([ELM93, PST98], 
reviewed in [KH04]). For one-dimensional systems, extraordinary accu¬ 
racy has become possible with the density matrix renormalisation group 
(DMRG) algorithm ([Whi92, Whi93], review [Sch05]). Recently, this al¬ 
gorithm was extended to allow the calculation of not only ground state 
properties but also of thermal states [ZV04, VGC04] and time evolutions 
[LXW03, MMN05, Vid04, DKSV04], Also, an extension to higher spatial 
dimensions has been proposed [VC04a]. Its usability in practice has been 
demonstrated only very recently [MVC07], 

All these variations of DMRG are based on the same class of variational 1 
states, namely matrix product states [R097], We have recently found 
[APD 06] that another class of states, namely the so-called weighted graph 
states (WGS), first studied in different context in [DHH+05, HCDB05], is 
also quite promising as ansatz for variational approximation of ground 
states of spin systems. Its particular advantage is the unlimited amount 
of entanglement that can be present. Hence, we consider our technique 
as especially promising for systems with long-range entanglement such as 
critical systems. A further key difference of our states to matrix product 
states is that their mathematical structure does not reflect any spatial geom¬ 
etry (while the product of matrices in a matrix product state reflects a chain 
or ring geometry, as studied especially in [R097, VPC04]) and hence may 
be expected to be equally suitable for higher dimensions (2D or 3D) as for 


1 Strictly speaking, only the fixed-length phase of DMRG can be called a variational 
method. 
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ID. Hence, even though we probably cannot compete with the astounding 
accuracy of DMRG in ID, we aim to provide a complementary alternative 
to the higher-dimension generalisations of DMRG [VC04a], In [APD+06], 
we presented this technique and demonstrated its use for simple spin-1/2 
systems in one and two dimensions. In the present article we explain our 
method in much more detail, show new results we have obtained since 
then (especially regarding the treatment of spins higher than spin-1/ 2, and 
concerning heuristics to perform the minimizations) and tests its usefulness 
on practical examples. The article is self-contained and does not assume the 
reader's familiarity with weighted graph states or the content of [APD 06]. 

This article is organised as follows: We start in Sec. 8.2 by reviewing some 
general observations about variational methods. In Sec. 8.3 we describe our 
class of variational states as a generalisation of weighted graph states and 
discuss their parametrisation. Section 8.4 explains how reduced density ma¬ 
trices of these states are calculated in an efficient manner in order to be able 
to evaluate expectation values of observables, including energy. To test our 
method, we show results for calculations on two different models (namely 
the XY model and the Bose-Hubbard model) in Sec. 8.5. In a variational 
method, a crucial part is finding a state within the given class that minimises 
the energy as well as possible. Our techniques for doing so are the topic of 
Sec. 8.6. We add some further notes on the details of our numerical imple¬ 
mentation and its performance (Sec. 3.5), and finish with a conclusion and an 
outlook on further work (Sec. 8.7) 

8.2 General considerations on variation 

For a Hamiltonian H that is too large to diagonalise one can approximate the 
ground state using the Rayleigh-Ritz variational method. One uses a family 
of states |Y(x)) which depend on some parameter x. It may be better to see 
this as a map from a parameter space IR K to a Hilbert space H: 

Y : K k —>■ H, x ■-> |Y(x)). 

One then solves the minimisation problem 

(Yfx) I H I Y(x)) 

E min = min xGR K E(x); with E(x) - (y(x) | Y(x)) 

in order to obtain an upper bound E m i n to the ground state energy and an 
approximation |Y(x m j n )) for the ground state. 

For this to give good results, the map Y has to fulfil the following condi¬ 
tions: 

(i) There must be an efficient algorithm to calculate the expectation value 
of observables for any state Y(x). In principle, it is sufficient to be able to 
calculate (Y(x) | H | Y(x)) and (Y(x) | Y(x)), but if one wants not only to 
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bound the ground state energy, but also analyse the ground state approxi- 
mant |Y(x m i n )), it is desirable to be able to calculate expectation values for 
other observables, too. 

"Efficient" means here a computation time at most polynomial in the 
number of parameters K. As the dimension of the Hilbert space hi. typically 
scales exponentially in the number N of constituents of the system, we want 
K to not do the same. Thus, the map Y, considered as a a family Y^ of maps 
for different system sizes N, should be such that the dimension K of its do¬ 
main scales only polynomially with N, and thus logarithmically with dim hi. 

(ii) There should be reason to expect that there are states |Y(x)) within 
the range of the map Y that have large overlap with the true ground state or 
at least an energy near to the true ground state energy. As the range of Y is a 
sub-manifold of hi of dimension at most K <C dim hi, this requires it either to 
be folded and twisted in a quite peculiar way to reach many different regions 
of hi, or to happen to occupy the same small part of hi as the ground state. 
Typically, it is not possible to prove such a statement, and one hence has to 
do with heuristic arguments or numerical evidence. 

(iii) There should be reason to expect that the minimisation programme 
(8.1) succeeds in finding a good minimum and does not get stuck in a bad 
local minimum. It is often not justified to hope to find the global minimum, 
but a local minimum of an energy only slightly higher than that of the global 
minimum is hardly worse. 

Whether the minimisation can succeed depends on the "energy land¬ 
scape", i. e. the graph of E(x). If this landscape has many local minima, a 
naive multi-start optimisation cannot succeed. Often, the number of local 
minima increases exponentially with N or K, which may render a method 
that is efficient for small systems useless for larger ones. Hence, one usually 
has to succeed in tailoring a heuristics that helps to find good minima for the 
specific kind of energy landscape one has to deal with. 

One of the best studied variational methods is finite-length DMRG, and 
we shall illustrate the conditions given above by briefly discussing how 
DMRG (in the formulation of Ref. [VPC04]) fulfils them. For DMRG, the 
class of variational states are the matrix product states [OR95, R097], For 
an N -site matrix product state, an efficient algorithm exists to evaluate the 
expectation value of any observable that can be written as a sum of tensor 
products of local operators in time linear in N. This meets condition (i). The 
expectation that a matrix product state is a good approximant for the ground 
state of a generic ID system (condition (ii)) is the very rationale that led 
White and Noack to their idea of keeping the lowest-lying eigenstates not 
of the short-range Hamiltonian but of the corresponding density matrix as 
explained e. g. in [Whi98]. The fact that condition (iii) is fulfilled, i. e. that 
the "sweeping procedure" of finite-length DMRG does not get stuck in local 
minima is somewhat mysterious, especially in the light of the possibility 
of construction of Hamiltonian for which this cannot be avoided [Eis06]. 
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Nevertheless, the construction principle, as exposed in [VPC04], shows that 
the matrix for each site has direct influence only on this site and its neigh¬ 
bours, i. e. matrix product states allow for an essentially local description of 
states despite the existence of significant amount of entanglement. Hence, is 
seems natural that —barring "pathological" cases such as those discussed 
in [Eis06]— the local variation of matrices during sweeps allows for a good 
minimisation, provided the initial N -site state was chosen well (which is 
the task of the so-called "warm-up", which uses infinite-length DMRG). 
Furthermore, Wolf et al. have recently shown a close connection between 
approximability and Renyi entropy for matrix product states [SWVC07a]. 

We shall come back to some of these points when comparing our varia¬ 
tional states with matrix product states at the end of Sec. 8.3. 


8.3 The class of variational states 


8.3.1 Basic idea 


Our class of quantum states derives from the so-called weighted graph states, 
which were introduced in [DHH+05, RBB03] and also used in [HCDB05]. 
They are a generalisation of graph states (introduced in [BR01], see [HDE + 06] 
for a review). For a Hilbert space of N qubits, they are defined as 2 

N N 

i r )=n n (8-2) 

fl=1 b=a+l 


where a product of phase gates W (f , is applied onto a tensor product of |+) = 
(|0) + 11))/ \J2 states. These phase gates are two-qubit operation, diagonal in 
the computational basis, and of the form 3 


W v = diag(l, 1, l,e t<p ) = exp 



cr z ) ( 8 > (1 - cr z ) 


(8.3) 


It may help to see the effect of W on small states. This is, e g., a three- 
qubit weighted graph state (where the qubits are numbered 1,2,3 from left to 
right): 


-i 

< 2) < 3) < 3) 1+ + +> - -0 |000) + 1100) + |OlO)+0- |110) + 

v 8 

+ |001) + e i<Pl3 1101 ) + 0 3 |011) + |111)) 

2 In this article, superscripts in parentheses always indicate the spins an operator acts on. 
Hence W, ? is an operator defined on a 2-spin space, while is defined on the full N-spin 
Hilbert space, but has support only on spins a and b. 

3 In [DHH+05, HCDB05], the notation Ucp ab is used instead of W,j0. Here, we use the W to 
emphasise that it is a specific, and not some general unitary. Note also the absence of a minus 
sign in the exponential e l< P, which differs from the convention used in [DHH+05]. 
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For every pair a, b of spins, there is a phase gate with a phase cp a i, = tpi, a . 
The key observation and starting point of the work in [DHH + 05] is that even 
for very large N, we can efficiently calculate any reduced density matrix for 
a subset A C {1,2,..., N} of the qubits as long as the number of qubits in A 
(i. e. the number of qubits not traced over) is low. This calculation is efficient 
in the sense that the time requirement scales only polynomially in N (though 
exponentially in |A|). This is remarkable because in the generic case, the time 
to calculate a reduced density matrix is exponential in N, and most classes 
of states which allow for calculation of reduced density matrices in polyno¬ 
mial times are bounded in the amount of entanglement that they can contain. 
Especially in the case of matrix product states, this fact is the dominant rea¬ 
son why DMRG cannot be applied successfully for certain settings [VPC04]. 
Weighted graph states, on the other hand, are not bounded in the amount of 
their entanglement, as shall be explained in Sec. 8.3.3. 

There is no guarantee that these states spread through those parts of the 
Hilbert space which are of interest for us, and hence, we add as many further 
degrees of freedom to the form (8.2) as possible without losing the ability to 
efficiently calculate reduced density matrices. As will be demonstrated in 
Sec. 8.4, the following additions do not hinder the efficiency of the reduced 
density matrix evaluation: (i) Let the phase gates act not simply on +) ' N , 
but on any N-qubit product state, (ii) Even weighted superpositions of m 
product states can be treated, provided m is small, (iii) After the phase gates, 
arbitrary local unitaries may be applied. 

8.3.2 Parametrisation 

Deviating from the treatment in [APD 06], we develop the formulae not just 
for spin-1/2 particles but, more generally, for ft-level systems, i. e. our states 
live in a Hilbert space 7i = (C n )® N . 

Superposition of product states 

We start with a superposition of m product states, which we write 

m N 

nrm (x) ocj 

j= 1 a=1 

m N 

= nrm E a /®E d’as\s). (8.4) 

j= 1 a= 1seS 

The operator nrm denotes normalisation: nrm \ xp) := \ip) /|| |(/’) ||. To facili¬ 
tate notation, we also introduced 

V := {1,2,.. . ,N} (set of spins) 

S := {0,1,... ,n — 1} (set of levels) 


(l°) + d i,l I 1 ) + d i,2 I 2 ) ^-^ d i,n -1 \ n ~ !)) 
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As we normalise afterwards, we can fix the coefficient in front of |0) to 1: 
d[ 0 = 1 for all a, j. It will also be useful later to introduce the deformation 
operators 

D d := J^d s |s) (s|, withd = (d Q ,d lf .. 

seS 

and the n -level |+) state 


+ ) 



X»> 

sgS 


such that the state (8.4) can now be written in the forms 


nrm 



+ )“ = nrm £ 

seS N 



S ) 


(8.5) 


Phase gate 


We entangle these product states by applying onto each pair a, b of spins a 
generalisation W<j> of the 2-level phase gate W (f , from Eq. (8.3). We want to 
define Wo as general as possible, but have to meet three constraints: (i) All 
Wd> have to commute (because otherwise the calculation of reduced density 
matrices explained later in Sec. 8.4 does not work). Hence, they have to be 
diagonal, (ii) Wo has to be unitary (for the same reason). Hence, the entries in 
its diagonal have to be pure phases, (iii) Wo should not have any parameters 


which can be absorbed without loss of generality into the d{ ls . To see, which 
these are, let us look at the example of n = 3: 


Wo 


Po 

Pi 

Pi 



( ® 00 PoTo \ 


( ^01 \ 


O 01 foji 


C 02 


d> 02 /3 qT2 


C 03 

to y 


<S> 10 Pilo 


ClO 

7i 

= 

O 11 yS lTl 

= 

£ll 

72 J _ 


Q 12 Piiri 


£l2 


0 o /3 27 o 


£20 


O 21 f>2Ti 


£21 


V & 12 Pll2 / 

V C 22 / 


If one is given the £ S f and can choose the f> s and y t at will, one does not need 
the freedom to set all entries in Wj>. It suffices to have 4 phases: 

Wj> = diag ( 1 , 1 , 1 , l,e !<fll ,e ! ° 1Z , l,e '° 21 ,e 1 ^ 22 ). ( 8 . 6 ) 

In general, for n levels, one needs to specify (n — l ) 2 phases for each phase 
gate W<j>. We denote the phases by a (n — 1 ) x (n — 1 ) matrix (with ele¬ 
ments < 0 ) and have 


71 — 1 

Wo = l nxn © 0 diag 

S = 1 



e i0s2 , 
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Defining O s0 = 0 and O 0f = 0 for all s,t G S, we can simply write 

W 0 = E ^ 10 ( st \ (8.7) 

s,te S 

Our variational states now take the following form: 


/ N ^ 

m 

w 

N 


|Y(x)) = nrm (g) U a 

E«; 

n<’ 

(g> E d I s ) ■ 

(8.8) 

\«=i y 

;=i 

a,b£V 

\ a<b / 

a=iseS 



The vector x is a concatenation of all the parameters that are present in the 
right-hand side, i. e. the (real) parameters of x contain the real and imaginary 
parts of the complex scalars d{ s and ay, the (real) entries <EW fe of the phase 
matrices, and the parameters describing the N local unitaries U n E SLT(n), 
a = 1,...,N. 

Parametrisation of the unitaries 

Next, we need to choose a parametrisation of SU(n) in order to describe the 
unitary matrices U. For this, we use an isomorphism between the set SU(n) 
of unitary n x n matrices U and the set of Hermitean n x n matrices A because 
Hermitean matrices are easy to parametrise. We could use (a) the matrix 
exponentiation U = exp iA or (b) the Cayley transform (introduced 1846 by 
Cayley, see e. g. [Puz05]) 

U = (it + A)(it- A)-\ (8.9) 

To calculate these expressions numerically, we need, for (a), a matrix diago- 
nalisation and, for the matrix inversion in (b), an LU factorisation [TB97], We 
choose the Cayley transform, not only because it is slightly faster, but espe¬ 
cially because we will later have to evaluate the derivatives of U with respect 
to its parameters, and while this is very involved for (a) [NH95], it is rather 
trivial for (b) [P1M], (A disadvantage seems to be on the first glance that 
the Cayley transform is undefined if A has -1 as eigenvalue, because then, 
(z'1 — A) cannot be inverted. The algorithm will not, however, converge to 
this case, and if it happened to hit on it, the program would abort.) 

Parameter count 

Let us now count the number K of real parameters needed to describe a state 

|Y(x)): 

• For each phase gate, we need (n — l ) 2 real numbers. In case of one 
phase matrix for each pair of spins, there are N(N — 1) /2 gates. 

• For the deformations, i. e., the specification of the initial product states, 
we need 2mN(n — 1) real numbers. 
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• For the superposition coefficients, 2m reals. 

• An n x n Hermitean matrix is specified by n(n — l)/2 complex entries 
in one of the triangles above or below the diagonal and n real entries in 
the diagonal. Hence, we need for the N unitaries a total of Nn(n + l)/2 
real parameters. 

Thus, the number of parameters is 



= 0{N 2 n 2 + Nnm ). 


( 8 . 10 ) 


8.3.3 Entanglement properties 

As already mentioned an important motivation for this work was the goal 
to find a class of states which exhibit strong entanglement over arbitrary dis¬ 
tances that is somewhat "generic". After all, the limited ability to describe 
such entanglement is a common shortcoming of many approximation meth¬ 
ods for many-body quantum mechanics. For the case of DMRG, this has been 
studied in detail in Ref. [VPC04], There, it was shown that the matrix prod¬ 
uct states that arise during DMRG can be understood as "projections" from 
an auxiliary linear quantum system of the valence bond solid type [VC04b], 
Hence, whenever one cuts the matrix product states "chain" into two parts, 
the blockwise entanglement (i. e., the entropy of the reduced density matrix 
of one of either part) is bounded by 2 log 2 D, where D is the dimension of 
the auxiliary spins, which is equal to the number of "kept states" in DMRG 
parlance or the matrix size in the matrix product state picture. This explains 
why DMRG performs not too well when applied to long ID systems with 
long-range entanglement or, more precisely, to systems where the blockwise 
entanglement grows with the block size. 

A scaling of the entanglement is hardly avoidable when treating systems 
with more than one dimension. According to the various "area law" the¬ 
orems and conjectures, for most systems the entanglement of a block versus 
the rest of the system scales linearly with the area of the interface between this 
block and the rest [AEPW02, PEDC05, Wol06, CEPD06, WVHC07], Hence, 
for, say, a 2D system, the entanglement scales linearily with the surface area of 
the block and matrix product states are unable to render this feature without 
their matrix size growing quite fast. There are ways of replacing the matri¬ 
ces with higher-rank tensors to keep up with the area law, yielding so-called 
projected entangled pair states (PEPSs) [VC04a] but the formalism of these is 
rather tedious and grows more complicated with increasing spatial dimen¬ 
sion. Also, PEPSs cannot go beyond the area law and are hence still unable 
to treat systems that do not follow the area law, i.e., show entanglement that 
scales superlinearly with the block surface, which typically is the case in crit¬ 
ical and certain disordered systems [Kor04, KM05, CEP07, VWPC06, EO06, 
BCS06]. 
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We hope that our variational method turns out to be a viable complemen¬ 
tary method especially to this "PEPS" generalization of DMRG. To see how 
this claim may be substantiated, note that in the description of our states, the 
geometry of the system has not entered yet. Every spin is connected to ev¬ 
ery other spin by a phase gate, and we can thus model any geometry, i. e., 
any scheme of neighboring relations. The entanglement of a block of M spins 
w. r. t. the rest of the system (with N — M spins) can scale with the number 
of spins M, i.e. with the volume and not with the surface area of the block 
[CHDB05]. Thus, the blockwise entanglement can reach the maximum value 
that is possible in the given Hilbert space. Other entanglement measures 
such as localizable entanglement between pairs of spins and also two-point 
correlation functions can reach their maximum value (independent of the dis¬ 
tance), but can also show exponential or polynomial decay [DHH + 05]. This is 
already evident from the fact that 2D cluster states are within our variational 
class, and they reach maximum entanglement in several senses [NMDB06], 
e. g. the localizable entanglement between all pairs of spins is one. 


8.3.4 Making use of symmetries 


Symmetrising the phases 

The quadratic scaling of K with the number N of spins (lattice sites) in Eq. 
(8.10) can be reduced to a linear scaling in case of a system Hamiltonian with 
translational symmetry. This is because in this case it is reasonable to assume 
that we do not lose precision if we let the phase matrices depend not on the 
absolute positions of the spins a and b but only on the position of b relative to 
a. More precisely, we introduce a mapping v : V - x V - —> {1,..., R}, that gives 
the phase index for the spin pair a, b: the phase gate that is applied on the pair 
(a, b) shall be the phase matrix with number v(a, b), and R is the total number 
of phase matrices. The 4th-order tensor O' p now becomes a 3rd-order tensor 
< 1 / 0 , 

The mapping v has to be constructed such that two pairs of spins, (a, b) 
and (c, e), get the same index, v(a, b) = v(c, e), if and only if the pair ( a , b) can 
be mapped onto (c, e) by a symmetry transformation that leaves the system 
Hamiltonian invariant. For the common case of a Hamiltonian that is a sum 
of identical terms which each act on one bond (i. e., connection of lattice sites), 
this is the symmetry group of the lattice. In the case of a square lattice with 
N = L x L sites on periodic boundary conditions (PBC), only 4 



L 

2 




= 0(N ) 


phase matrices are needed as can be seen from Fig. 8.1, and thus, we need 
only K = 0(N) parameters. 

4 The brackets |_-J denote the floor function. 
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Figure 8.1: For a system on an L x L square lattice with periodic boundary 
conditions and a system Hamiltonian that is invariant under the lattice's sym¬ 
metry group, only R = 0(N ) phase matrices are needed. The numbers in¬ 
dicate the numbering of these matrices with phase indices v = 0,... ,R — 1. 
The circles denote sites of a 6 x 6 lattice. In constructing a variational state 
(8.8) on this lattice, phase gates W<j> v are performed on any pair of sites, where 
the phase gate acting on the purple shaded site and a site marked with the 
number v uses the phase matrix <ty. Translation of these markings show the 
phase indices for other site pairs. Note, how due to the rotation and reflec¬ 
tion symmetries of the square lattice, the pattern of phase indices is repeated 
eight times. 


Note also, that v is naturally symmetric, v ( a, b) = v (b, a ), and that this has 
to be reflected by a like symmetry of <3> w. r. t. its upper indices: 0 1 n = < t >, 1 ? M , 
which must be imposed explicitly 

Full symmetrisation 

For a symmetric Hamiltonian, it seems natural to reflect this symmetry not 
only in the phases <t>, but also in the local, site-dependent properties, i. e., in 
the local unitaries U n and the deformation parameters d, ? . In case of full trans¬ 
lation symmetry, one may want to completely drop the dependence of these 
on the site index a. This does indeed reduce the number K of parameters 
significantly, but not as dramatically as in the case of phase symmetrisation. 
The latter reduced the scaling of K from 0(N 2 ) to 0(N), while further sym¬ 
metrisation of the other parameters cannot change K = O(N). On the other 
hand, the time required to calculate the energy of a given state is reduced by 
a factor 0(N) in the fully symmetric case, as one needs to evaluate it for only 
one elementary cell of the lattice. 

A good reason not to impose full symmetrisation nevertheless is the ob¬ 
servation that for many systems, the ground state does not necessarily obey 
the full symmetry of the Hamiltonian due to spontaneous symmetry break¬ 
ing. Even though in such a case, the ground state must be degenerate, and at 
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least one state within the ground subspace must obey the full symmetry, this 
state is unlikely to be the state that is easiest to approximate within the chosen 
class of variational states. To give an example: The ground state of the antifer¬ 
romagnetic Ising chain without transverse field is a. |0101...) + /3 11010 ...), 
for any a,/3 with a| 2 + |/3| 2 = 1. Only for a = /3 the state is invariant un¬ 
der a translation of one site. However, the state most easily approximated is 
cx = 1, j8 = 0 (or vice versa), as it is a product state, while any other state con¬ 
tains long-range entanglement. If we imposed full translational symmetry 
onto the states, our algorithm would likely fail to find a good state. However, 
the example suggests a compromise between flexibility and low number of 
parameters: We make the local properties U a and d) periodic in a way that 
matches the expected periodicity of the spontaneously-broken ground state, 
e. g., in the case of the Ising chain, we may use one common unitary and 
one common deformation vector for all odd sites, and another unitary and 
another deformation vector for all even sites. However, our numerical exper¬ 
iments showed that this does not work particularly well: the enforcement of 
such symmetries introduces very many additional local minima which trap 
the minimization routine much too soon. The intuitive reason for this is that 
enforcing the symmetries amount to a cut through the energy landscape of 
the parameter space which seems to divide meandering troughs into sepa¬ 
rated basins. 

Let us nevertheless mention two more possibilities to even further reduce 
the parameter scaling, (i) We can make the phase index mapping v such that 
it does not depend on the geometric relation as in Fig. 8.1 but just on the 
scalar number of lattice steps that separates the spins, the number of phase 
indices scales linearly only with the length L, not with the number of sites 
N = 0(L n ) (where T> is the dimension of the system). Together with a full 
or periodic symmetrization of the local properties, we reach a scaling of the 
number of parameters K = O(L), which allows for a quick treatment even 
of 3D systems of moderate size. The accuracy achieved this way is, however, 
very modest. 

(ii) Often, one may expect long-range entanglement to be suppressed ex¬ 
ponentially. Then one can choose a distance threshold and fix to zero all 
phases between spins with a distance above this threshold. The threshold 
will typically be chosen of the order of the entanglement length, and as the 
latter usually does not increase strongly with the system size (except at criti¬ 
cality) one can save considerably on the number of parameters. 


8.4 Evaluating observables 

In order to evaluate an observable O with support on A C V, we need to 
evaluate 

(0) x = trOp A 
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with 

PA ■= tr y \ A |Y(x)) (Y(x)| . 

As we shall see now, p a can be calculated in time polynomial in the number 
N — | A | of spins, the number n of levels per spin and the number m of super¬ 
positions, but exponential in the number |A| of spins not traced over. Hence, 
the expectation value {0) x of observables can be calculated efficiently as long 
as O is a sum of terms with small support. 

In particular, we need this algorithm to evaluate the energy (Y(x) | H |Y(x)), 
as this is the quantity we wish to minimise. Thus, due to the scaling proper¬ 
ties just mentioned, we require that the system Hamiltonian can be written 
as sum of terms with small support (as it is the case nearly always). 

8.4.1 A pair of spins 

To keep notation simple, we only derive the procedure to obtain the two-spin 
density matrix (A = {a, b}) 

Pab ■■= P{a,b} = tr V\HT} |Y(x)) (Y(x)| . (8.11) 

This is a generalisation of the work done in [DHH+05] for spin-1/2. A further 
generalisation to more than two spins is easy and its result will be given at 
the end of this section. 

The spins that we do not trace over are denoted a and b. We start by in¬ 
serting Eq. (8.8) into Eq. (8.11) and pull as much as possible out of the partial 
trace: 


p ah = nrm (U a <S> U h ) W<t> ab x 

x ( l (D d ; ® D d y) p* (D d * ® D d ,) + j x 

x Wt ai (U a <g> 14) + . (8.12) 

Here, the operator nrm again means normalisation, now defined as nrm p : = 
p/ tr p, and the inner term p ab contains anything that cannot be pulled out of 
the partial trace: 

jk 

Pab ~ tr V\{fl,fc} 

with 

A) = f n ® E4i»). 

\ceV\{a,b} / ceV\{a,b} seS 

which is, due to Eqs. (8.5) and (8.7), 


tfab 


rib 


(8.13) 
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Note that in the trace (8.13) all the phase gates with c, e {a, b} cancel 
with their Hermitean conjugate, as do all the local unitaries U c , c £ { a, b }. 
Hence, p a \, depends only on a subset of the parameters. 

In order to take the trace in Eq. (8.13), we have to sum over all states 
q),q G S N ~ 2 , where the underline denotes that the components of q are not 
indexed (si, Sz, ■ ■ . ,s#_ 2 ) but rather using the elements of V\{a, b} as indices. 
We get 


J k 


= E \s 

qeS N - 2 

= E E u 

qeS N - 2 s,s'eS N 




SMS 


q / x 


X n d k d cs c exp 

cEV\{a,b} 


j ( <J) S » S c 1 ^Sc _ qjSM _ <jX S c 
1 ac ' be ac be 


= E I'X-'I 

r,r'eS 2 


X 


x E n *.4, exp 

q eS N - 2 cev\{a,b} 


In the last line, we can exchange sum and product in the following manner 
without changing the expression: 

e n — n e- 

qcS N - 2 ceV\{a,b} ceV\{a,b} q c e S 


This gives 



E l r )( r 'l n E d U d % ex p 

r,r'6S 2 ceV\{a,b} qES 

x._ 


0 


nq 


+ o: zi? - 

1 be ac 


-0 


'2 <i\ 
be J 



(8.14) 

The sum over S has n terms, and N — 2 such sums are multiplied. Hence, 
in order to calculate one matrix element of p ah we have to evaluate the under¬ 
braced term (N — 2 )n times. This is the origin of the promised polynomial 
scaling for the calculation of expectation values. 

Recall that is an n 2 x n 2 matrix. We will make this more explicit by 
writing the product as Hadamard product. The Hadamard product, denoted 
0 , is defined as the component-wise multiplication of matrices, (B 0 C) f/ := 
BjjCij. Its identity, denoted 1 0/ is the matrix having 1 as all of its elements. 
Using this, we can rewrite the previous equation in a very compact form: 


A= O 4c (8.15) 

cev\{a,b} 


0 


0 


0 


0 























0 


0 


0 


0 


8.4. Evaluating observables 


155 


ijj- 

where the matrix elements of p ab are given by the underbraced term in Eq. 
(8.14). Each factor of the Hadamard product can be understood as resulting 
from the interaction of the spins a and b with one spin c from V\{a, b }. These 
factors can be calculated separately because the interaction between two dif¬ 
ferent spins in V\{a,b} may be and is ignored due to the cancellation of all 
phase gates within V\{fl, b}. (Cf. the remark after Eq. (8.13).) 

To make this more concrete, let us look at the simple case of n = 2. Then 
(Recall that d[ 0 = 1, and O s,t = 0 for s = 0 or t = 0.) 


with 


Pab,c 


Pnb = O (0 + d ’c d T Pab,c) 

cEV\{a,b} 
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e l®lc 
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C be 
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/ 

gi(4#-H*>£) 

e^c 

g&ac 
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/ 



(8.16) 


which is the formula given in [DHH + 05]. 


8.4.2 Several spins 


For reference, we give the result for density matrices for not simply two spins 
a, b, but arbitrary numbers of spins, given in a set A: 


p A := tr v \ A |Y(x)) (Y(x)| 
/ x / 


= nrm 


U a 


\aeA 


n <; um >< 


a,beA 

a<b 


E 

M=1 


a,-a 


]“k 


D, 


\neA 


G A, 

K ceV\A 
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\aeA 


( 


n< W)) <8)0 


(8.17) 


with 


a,beA 
\ a<b 


Pac = E d, cA d c*s exp 
Vses 


\aeA 


*E 


aeA 


(8.18) 


r,r'6Sl / 'l 


The mapping r : A —> {1,...,|A|} gives here the index that spin a E A gets 
within the density matrix Pa (i- e., in the 2-spin case of r (a) = 1 and 

v{b) = 2.). 
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It is also useful to observe that 



D, 


U A 


Op 


(8.19) 


with 

N) = E wni.,., (8-20) 

rsSl A l aeA 

This formula comes from the observation that a matrix product of a diagonal 
matrix, an arbitrary matrix, and another diagonal matrix can be written as 
Hadamard product: 


(diagu) A (diagv) = A 0 



( 8 . 21 ) 


Further, for the numerics, one may use 



( 


\ 

n = diag 

exp 

i y ©©'Nio 

/ ■ ab 


a,beA 


a,bEA 

) 

a<b 

_ a<b 

J rgSl A l 


8.5 Demonstration for two models 

To approximate a ground state, we have to vary the parameters in order to 
minimize the energy. Before we explain our techniques to achieve this we 
show the results of such minimizations for two different models to demon¬ 
strate the performance of our technique. The two model systems, namely 
the XY model and the Bose-Hubbard model, are presented in the two follow¬ 
ing subsections. For each of the two models, we have used a different im¬ 
plementation (see Sec. 8.6 for details) and different heuristics for the global 
minimisation. Hence, we shall use these results as examples when explain¬ 
ing these heuristics in Sec. 8.6. As the second implementation is newer and 
its heuristics more sophisticated, the results for the Bose-Hubbard model are 
more convincing. Nevertheless, we also present our results for the XY model, 
as the old heuristics provides illuminating insights into important aspects of 
our methods behaviour. The examples with the XY model are a continuation 
of the examples for the Ising model (which is a special case of the XY model) 
already given in [APD + 06]. 


8.5.1 The XY model with transverse field 

The XY model with transverse field for a system of spin-1 / 2 particles on a lattice 
is given by the Hamiltonian 

H= £ ( HAHT’ + ©YNf ) + E b O > , 

{a,b}eB V z z / aev 
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Figure 8.2: Phase diagram of the XY model, (a) for ID (according to [BM71]), 
(b) for 2D (according to [Hen84]). 


where cr X/ y /Z are the Pauli matrices, B is the set of nearest neighbours, B the 
transverse field and 7 is called the asymmetry. For 7 = 0 , we get as special 
case the XX model, and for 7 = 1 , we get the Ising model. 


One dimension (spin rings) 

For ID, the XY model with transverse field can be diagonalised using a 
Jordan-Wigner and then a Bogoliubov transformation (the latter is trivial 
for 7 = 1 ). Correlations have been studied in early work in [Pfe69] (Ising) 
and [Kat62, BM71] (XY). The latter article also gives the phase diagram of 
the ID XY system (reproduced in Fig. 8.2a). The entanglement properties of 
these phases and their transitions have recently found much interest. The 
behaviour first indicated by numerical studies [VLRK03, LRV04] was soon 
confirmed by analytic calculations [JK04, Pes04, IJK05]. 

Our technique seems to be suited to study this model: the results are quite 
precise. Fig. 8.3 shows a transition through the Ising critical point. The curves 
show the XX correlations for different spin-spin distances in a ring of N = 16 
spins. 

As our technique tends to spontaneously break symmetry where the true 
ground state does not, it makes sense to plot the two-points correlations 5 for 
many different values of the parameters of the Hamiltonian (here: B and 7) 
in order to spot phase transitions. We find that it works better to plot cor¬ 
relations for a specific distance than to estimate correlation lengths from the 


5 We either plot the XX correlations or the maximum singular value of the correlation matrix 
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Ising spin ring, N = 16 



Figure 8.3: Approximation of the ground state of an Ising ring with 16 spins, 
calculated for m = 4, i. e v the phase gates act on superpositions of 4 product 
states: The orange triangles shows the deviation of the variational energy 
from the true ground state energy as obtained from the exact solution. The 
other symbols show the mean value of XX correlations for different spin-spin 
distances. As a guide to the eye, symbols for the same distance are connected 
by lines. It is evident that one seems to need two lines to connect the set of 
data points for each distance, as a single line would "jump" in zig-zag near 
the critical point. In other words: There seem to be two basins of attraction 
for the minimizer, corresponding to the B < 1 and B > 1 phase, and near 
the critical point, the minimizer may either fall into one basin or the other. 
The use of the sweeping technique, to be discussed in Sec. 8.6.3, allows to get 
around this undesirable behavior and direct the minimizer to minima with 
smaller basins of attraction which better resemble the true ground state in 
the area of influence of the quantum phase transition. Hence, this plot is 
not meant to show a good result, but rather illustrate the kind of failure that 
motivates and necessitates the sweeping technique. 
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asymmetry 

Figure 8.4: Correlations (maximum SV of correlation matrix for distance 6) 
of the ID XY model, calculated for a ring of 16 spins, and with m = 3. The 
brown crosses show the positions of the data points, the colour surface in 
between is interpolated using Sibson's method (see Sec. 8.A.5 and be careful 
to not be mislead by artifact of the interpolation, such as apparent features 
at regions with too sparse data points). The correlations appear at that re¬ 
gions that also have high entanglement in the thermodynamic limit. (Com¬ 
pare with Fig. 3 in [LLRV05].) However, this agreement is, unfortunately, 
only qualitative: A comparison with the exact result for the considered finite 
size case, shown in the small plot to the right, reveals that correlations are 
over-estimated significantly 


data because the system is still so small that the exponential decay of correla¬ 
tions is masked by boundary effects . Fig. 8.4 shows such a plot for the ID XY 
model. As is to be expected, one sees that near critical regions correlations are 
much stronger. (For the infinite chain, the critical regions are: XX criticality at 
7 = 0 for 0 < B < 1 and XY criticality 6 for B = 1 [BM71].) The spread of the 
areas of high correlation around the critical regions of the infinite chain looks 
similar areas of high entropy identified in [LLRV05] - compare with Fig. 3 in 
that article (and note that there, entropy is small around 7 = 0, B = 1 despite 
the critical nature of this point - a feature also seen in our plot of correlations.) 
Had we not known the critical regions, it is not merited to conclude that the 
system is critical where the correlations are strong, as the system size and 
the correlation distance is surely to small for this. We rather suggest to use a 
plot of this kind for a first look at a yet unstudied Hamiltonian. Regions of 
high correlations may suggest points in parameter space for which numerical 
calculations for different system sizes may give interesting results. 

Another interesting feature of the ID XY model is the Baruch-McCoy cir¬ 
cle, which is the defined by B 2 + j 2 = 1. On this circle, the ground state has 
product form [BM71], Our approach accurately reproduces the vanishing of 


6 Strictly speaking the model is XY critical only for B = 1, S ^ 1, and Ising critical for 
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B = 8 

Figure 8.5: Correlations (maximum SV of correlation matrix) in the vicinity of 
the Baruch-McCoy circle. Along this circle, which is defined by B 2 + y 2 = 1, 
the ground state of the XY model has product form. This is nicely reproduced 
by our numerics: At B = 5 = 1 / \/2, all correlations vanish. To show this, 
we here plot the correlations for spin-spin distances 1 (red), 2 (green), and 3 
(blue) for a cut along the line B = 5, i.e., radially through the circle. The x 
axis is the value of B = 5. Calculated for a ring of 16 spins and m = 3 as in 
Fig. 8.4. The solid, dark lines are results from the variation, the dashed, light 
lines are exact values. 


all correlations as one approaches a point on the Baruch-McCoy line (Fig. 8.5). 


Two dimensions 

The 2D XY model with transverse field has been studied in [Hen84], The 
main result of the latter treatment is illustrated by Fig. 8.2b. 

In order to demonstrate our scheme in a 2D setting, we have done calcu¬ 
lations for a torus (i. e., a square with periodic boundary conditions) of 6 x 6 
spins. We fixed the asymmetry at 7 = 0.65 and varied the field strength B 
from 0 to 4.5 in order to cross both of the phase transitions indicated in Fig. 
8.2b. The results, shown in Fig. 8.6, show prominent kinks at the expected 
positions of the phase transitions, and the correlations fall off in a roughly 
exponential manner with distance as expected. We still see additional jumps 
due to convergence into wrong basins, and this prompted us to seek a means 
to avoid this, namely the sweeping technique. The plots in the following sec¬ 
tion have been obtained this way and hence do not show such strong jumps. 
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XY, 6x6, PBC, delta=0.65 



Figure 8.6: Correlations (maximum SV of correlation matrix) for the 2D XY 
model, calculated for a torus of 6 x 6 along a cross section through the phase 
plane of Fig. 8.2b along the line 7 = 0.65 (number of superposed states: 
m = 3). The red arrows show the positions of the two phase boundaries 
for the infinite case (according to [Hen84], cf. Fig. 8.2b.) As this plot has been 
produced without use of the sweeping technique, some instances of conver¬ 
gence to the wrong minimum are evident from the jumps at B > 3. (Compare 
with the discussion at Fig. 8.3). The different curves show the correlation for 
spin pairs with distance (d x , df) in x and y direction. 
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8.5.2 Bose-Hubbard model 

The Bose-Hubbard model is defined for a system of harmonic oscillators, ar¬ 
ranged in a lattice, and is described by the Hamiltonian 

H = -J £ (% + H.c.) +Uj2na{h a -1) /2-pJ^ha (8.22) 

[a,b}eB neV aeV 

As before, V is the set of all lattice sites, and B the set of all unordered pairs 
of nearest neighbour. The operators m a and b„ denote the ladder operators 
to create and annihilate a bosonic excitation of the oscillator at site a, and 
7i a = blb a is the number operator. The first term, called the hopping term 
describes the "hopping" of an excitation from a site a to a neighbouring site b, 
a process which occurs with the hopping strength J. The second term describes 
the repulsion between several bosons on the same site. To fix our energy 
scale, we set the repulsion U to 1 in the following, i. e., all dimensionless 
energies are to be understood in units of U. 7 The last term is relevant if the 
particle number is not fixed, which it is in fact not in our case. Then, assigning 
a value to the chemical potential p allows to choose the mean density of the 
ground state. 

The Bose-Hubbard Hamiltonian is of interest due to its rich phase dia¬ 
gram, first exposed in [FWGF89]. While its original motivation was the de¬ 
scription of certain structured solid state systems such as arrays of Josephson 
junctions, interest in the system increased significantly with the discovery 
that it can be realized with cold atoms in optical lattices [JBC + 98] and with 
the spectacular experimental demonstration of this fact [GME+02], where a 
transition from the Mott insulator phase to the superfluid phase and back 
was observed. (For a review, see [LSA ' 07]). 

In order to simulate a bosonic system with our ansatz, we restrict the 
number of occupations at each site. For all the following calculations, we set 
the dimension of each site to n = 5, i. e., the maximum occupation per site 
it n — 1 =4. The creation operator Zr is defined such that Zr \n — 1) = 0 in 
order to truncate the Hilbert space. 

A good way to distinguish the Mott insulator from the superfluid phase 
it to look at the mean compressibility 

K = ^ E \l 

flE V * 

which is strongly suppressed in the Mott insulator phase. Our results shows 
the form of the phase diagram in impressive clarity (Fig. 8.7). Although each 
data point only required a rather quick and rough calculation, one gets a 

7 When comparing with other literature, care has to be taken that many authors use the 
alternative convention to set J = 1. Also, / is often denoted t. 
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Figure 8.7: Mean compressibility k of a 4 x 4 lattice (with PBC) of Bose- 
Hubbard sites as function of the Hamiltonian parameters / and //. One can 
clearly see the Mott insulator lobes —characterised by low values (0 in the 
infinite case) of k — for densities p = 1,2, 3. and the surrounding superfluid 
phase. The crosses in cyan mark the parameter points for which a calculation 
was performed. To depict the values, an surface was interpolated between 
these points and is used to colour the plot. Unfortunately, we could not fully 
get rid of artefacts due to the interpolation (see 8.A.5), and at those regions 
where the density of data points varies the contour lines become distorted. 
The resulting "wobbling" at regions with sparse data is hence not genuine 
and should vanish if one adds more data points. As cuts through the plane 
do not suffer from these presentation problems, we have calculated much 
more points for three fixed values of / and show these cuts in Fig. 8.8. 
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Figure 8.8: As evident from the crosses in Fig. 8.7, we have many data point 
for / = 0.02,0.04,0.062, which allow us to plot vertical cuts through the plane 
of Fig. 8.7. Here, we show the values of the average occupation number per 
site p (a) and the compressibility k (b) for the mentioned three values of the 
hopping strengths /. The curve for / = 0.02 looks smoothest because these 
points have been calculated to higher accuracy (cf. Fig. 8.10). 



Figure 8.9: For values of the density p corresponding to very low (and in¬ 
teger) total particle numbers, we can obtain exact values for energy E and 
compressibility k from diagonalisation of the full Hamiltonian. This shows 
that our method has good accuracy. (Note that the plot corresponds to a very 
small section of the steep left-most flank in Fig. 8.8b.) For the / = 0.02 curve, 
which has been calculated to especially high accuracy, the exact values for 
pN = 4,5 (i. e., as N = 4 2 : p = 0.25,0.325) coincide with the approximated 
and interpolated values with an absolute deviation of only 10 4 . Even for the 
values for / = 0.04, which have been obtained with much fewer sweeping 
steps, the accuracy is below 2 • 10“ 3 . 
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BH, 4x4, J-0.02 



Figure 8.10: Non-local properties of the ground state are much harder to ob¬ 
tain than local ones. In order to see whether our technique is also capable of 
yielding good results here, we calculated (for / = 0.02 and varying values 
of p for the 4x4 sites Bose-Hubbard system) the density-density correlation 
7 := jj ]2 a cv ( 0 V) — p 2 where a' denotes the site that is one or two lat¬ 
tice step(s) to the right (denoted with (1,0) and (2,0). (Correlations of this 
kind have, incidentally, been studied recently in [PRB06].) The connected 
points show the results obtained with our approximation, the isolated points 
have been calculated using the worm code (quantum Monte Carlo technique) 
[TAH98] of the ALPS project [ADG + 05] (at finite, but low temperature). The 
agreement is qualitatively OK but quantitatively not too precise. (Actually, 
the precision of two-point correlation is unfortunately insufficient to obtain 
a good picture of the momentum distribution from their Fourier transforma¬ 
tion.) 


0 


0 


0 


0 





















0 


0 


0 


0 


166 8. Publication: A variational method based on weighted graph states 



Figure 8.11: Ground state approximation for Bose-Hubbard systems with up 
to 32 x 32 sites. The figure shows the compressibility k as function of the 
chemical potential p at the phase transition between the n = 1 Mott lobe and 
the superfluid phase above of it (for a coupling of / = 0.02U). As one can see, 
the numerics cope with the large amount of sites but fails to converge with 
a precision sufficient to make out clear finite-size scaling trends. The reason 
seems to be that the number of local minima increases very strongly with sys¬ 
tem size: While the use of sweeping technique allowed to get a smooth curve 
for the 4x4 case, our attempts on the larger systems failed to get smoother 
than shown here. (We put considerably more effort in the data for p < .8345, 
but even there, the curves are still very noisy) 
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good overview of the ground state properties in dependence of the Hamilto¬ 
nian parameters / and }i. To better show the quantitative features, we have 
also plotted vertical cuts through the (/, //) plane (Fig. 8.8). Especially for 
the points at / = 0.02, the zigzag sweeping technique (described later in Sec. 
8.6.3) was used to improve accuracy by more than an order of magnitude. 
This can also be seen from Fig. 8.9: In this plot, we compare the observable 
k, calculated for our approximant states, with exact values. To allow for this 
comparison, we have included values from exact Lanczos diagonalisation. 
We are grateful to G. Pupillo, who supplied these numbers to us. He used a 
program, written for another project and using ARPACK [LMSY96], that al¬ 
lows to diagonalize a small Bose-Hubbard system exactly if the number of 
particles is small as well. For a 4 x 4 system, up to approx. 6 particles in the 
16 sites can be treated. This corresponds to the very beginning of the plots 
of Fig. 8.8, which we have magnified in Fig. 8.9. The accuracy of 10 4 for the 
compressibility is competing well with the precision attainable with quantum 
Monte Carlo techniques. 

While the compressibility is a local observable, the more challenging task 
is to study non-local observables such as density-density correlations of the 
form 



where a' is the site which has a fixed position relative to a, i. e. via, a') = v(a). 
In Fig. 8.10, we attempt this task for a 4 x 4 lattice. Fig. 8.11 shows calculations 
for larger systems, up to 32 x 32 sites. While the noise present in the latter 
plot is small on an absolute scale (note that the plot zooms in to a quite small 
parameter region) is is unfortunately still too large to prevent us from doing 
finite-size scaling. 

8.6 Performing the minimisation 

Usually, the Hamiltonian of a spin system is given in the form of a sum of 
terms each of which has support on only a small number of spins - one or two 
in most physical cases. When the terms acting on single spins are absorbed 
into those acting on two spins, such a Hamiltonian can be written as 


H = E 


(a,b)eB 


where B is the set of all pairs of spins, on which a term acts jointly. These pairs 
are called bonds in the following, and they typically (but not necessarily) form 
a regular lattice. The bond Hamiltonians H„i, may all be equal or not, and only 
in the former case, the simplifications of Sec. 8.3.4 can be used. 
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The minimisation problem Eq. (8.1) that we have to solve then takes the 
form 

£min = min xeRK E(x); with£(x)= tr H nh p ah 

[a,b)eB 

Finding a minimum of a general function of many parameters is a thor¬ 
oughly researched but intrinsically hard problem. Our approach is described 
in the following. As we do not assume the reader's familiarity with numeri¬ 
cal optimisation, we will explain some textbook knowledge. 


8.6.1 Local search 

Given a starting point xo E 10 in parameter space, the problem of local search 
or local minimisation is the task of finding a local minimum in the vicinity of xo. 
An exhaustive treatment of this topic can be found in the standard textbook 
[NW99] which covers all of the algorithms mentioned in the following in 
detail. In our case, we have to deal with unconstrained (i. e., all values of 
the unbounded space 10 are admitted) nonlinear (i. e., the energy function 
E(x) does not have any simple structure that would allow the use of a more 
powerful, specialised algorithm) local minimisation. Algorithms for this case 
come in two classes: So-called direct methods only require a means to evaluate 
the function at any given point, while gradient-based methods also require a 
means to obtain the gradient V x E(x) at any given point. 

Direct methods are convenient, but comparably slow. For very small sys¬ 
tems (chains of up to 6 spin-1/2 sites, corresponding to less than 100 parame¬ 
ters), we could achieve convergence with direct methods, using the two most 
common ones, Nelder-Mead [NM64] and Powell [Pow64] minimisation, with 
Powell minimization converging faster. 

For any meaningful system size, however, direct methods are much too 
slow. Hence, we coded routines to obtain the derivatives of E w. r. t. all kinds 
of parameters. 8 This required rather tedious calculation and coding, and the 
formulae and their derivation are summarised in 8.B. 

Using the gradient functions, we tried the standard minimisation 
methods the literature offers, namely the Fletcher-Reeves conjugate- 
gradient method, the Polar-Ribiere conjugate-gradient method and the 
Broyden-Fletcher-Goldfarb-Shanno (BFGS) method. We started using the 
implementations provided by the GNU Scientific Library [GTJ + 03], which, 
however, turned out to be not robust enough. Nevertheless, it could be 
established that convergence speed for our problem is as usually expected, 
i. e. Polar-Ribiere (the oldest of the algorithms, from 1964) performs worst 
and BFGS (the newest, from the 1970s) performs best. 

8 Note that it is not helpful to obtain the gradient by numerical differentiation, as this is 
hardly faster than using a direct method. 
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BFGS is a so-called quasi-Newton or Davidon algorithm. This means, it 
uses the gradients obtained at the points visited so far to build up an esti¬ 
mate to the Hessian of the function (assuming that the Hessian varies only 
slowly). The approximation to the Hessian (or more precisely, to the inverse 
of the Hessian) is then used to make a good guess for the next step. As the 
approximant is, like the Hessian, a K x K matrix, updating it at each step 
requires 0(K 2 ) steps, which scales worse than the calculation of E(x), and 
hence, maintaining the BFGS data becomes more expensive than evaluating 
the function. 

The textbook solution to this problem is to use the "limited memory" vari¬ 
ant 9 of BFGS, which is known as L-BFGS and stores only a list of the last, say 
25, gradients, and uses this data to produce a Hessian approximant "on the 
fly" [BLN95]. We have used the L-BFGS-B Fortran code [ZBLN97], which is 
very robust, not the least due to the excellent line search routine [MT94] that 
it uses. 

A problem is the stop condition, which decides when convergence is as¬ 
sumed. We have tried several approaches: watching the norm of the gradient, 
the size of the steps, and the difference of the function value per step; these 
either taken point-wise, or averaged over the last, say 30, or 100, steps, or 
taken the maximum from the last 30-or-so steps. All this could not clearly 
predict convergence, as there seem to be long shallow slides, which tempt 
one to stop minimisation prematurely. In the end, we found that waiting 
until progress gets below machine precision is most viable. However, in the 
sweeping technique, described later, the minimization can be stopped once it 
seems advantageous to first continue with a neighbor. 

The described technique only allows to find local minima. How do we 
find a good local minimum, or even the global one? Although the litera¬ 
ture discusses many different heuristics and algorithms, it is a far from trivial 
task to find a good scheme. We have developed and tested two different 
heuristics, which shall be explained in the following two subsections. Both 
heuristics are two-phase methods [Sch02], i. e., they combine a global driving 
scheme, that chooses points to start a local search from, with the local-search 
algorithm just discussed. 

In both cases the minimization for a specific tuple of Hamiltonian param¬ 
eters is typically performed several times. Whenever a new energy value is 
found which is lower than all values that have been found so far for this pa¬ 
rameter tuple, the previous energy value and corresponding state is replaced 
by the new one. Hence, the longer one performs these heuristics the nearer 
one gets to the true ground state (or more precisely: to the lowest lying state 
within the variational class). We emphasize that we never discard a data 

9 The term "limited memory" shows that the problem of keeping the full matrix was then, 
in the 1980s, not so much seen in the time it takes to update the matrix but rather simple in 
the fact that a large matrix might not fit into the memory of the computer. 
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point unless it has been "underbid" by a new calculation. This makes the re¬ 
sult objective even if subjective judgement has been used in carrying out the 
heuristics. 

8.6.2 Multi-start and step-wise adding of superpositions 

For the calculation of the results for the XY model (presented in Sec. 8.5.1), we 
tried several different heuristics in order to "move around" local minima. We 
finally settled for the "multi-start" scheme described now, which turned out 
to work best, at least for the examples we studied: We started by choosing 
the parameters xo for a state |Y(xo)) with m = 1 (i. e., without superposi¬ 
tions) uniformly at random from [—5; 5] K and the used the L-BFGS algorithm 
(as explained in Sec. 8.6.1) to go downhill from there towards a minimum. 
We allowed this minimisation only to run for a limited number of function 
evaluations (typically a few hundred, or up to 1000), and then restarted with 
another randomly chosen initial point. Having done a number (say 15) of 
such "trial runs", the one that reached the lowest energy within the limited 
number of steps is kept, the other data is discarded. The best run is now 
allowed to continue for a significantly longer time, until the maximum num¬ 
ber of "main run" steps (typically, several thousand function evaluation) is 
exhausted or the energy change falls below machine precision. Then, we in¬ 
creased the number m of superpositions by one. This makes the parameter 
vector x longer, i. e. 2 N + 2 real numbers have to be added (for N complex 
deformation parameters and one complex superposition coefficient, cf. Eq. 
(8.10) for n = 2). These are again chosen at random, but without changing 
the parameter values that have already been found. (It also helps to choose 
the new value for a,„ with small modulus, such that the new parameters do 
not let the state stray to far from the already established good state.) Again, 
a number of trial runs is started, with different random numbers to extend 
the parameter vector, and the best one is allowed to continue for many more 
steps in the main run phase. This iterative extending of parameter values was 
looped until m reached a certain value. This values does not have to be very 
large: for the results presented in Sec. 8.5.1, m = 3 was sufficient. 

A disadvantage of this heuristics is evident in Fig. 8.3: Some points are 
much worse than their neighbours. For example, the point B = 1.09 shows 
a sharp peak towards worse accuracy (orange line), while its neighbours to 
both sides are better. What happened is that near the phase transition, the 
two phases compete to govern the ground state, and once the minimiser gets 
trapped by the catch-basin of one of the two phases it cannot switch to the 
other one. In most cases, the multi-start scheme will allow us to enter the 
main run within the catch-basin of the correct phase. If, however, the min¬ 
imum energy of the two competing phases are very close, they cannot be 
distinguished during the short and rough trial runs, and it depends on mere 
chance to which phase we converge. The obvious solution is to use the value 
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Figure 8.12: Using derivatives to locally judge the quality of the approximant. 
Red (left) dashed vertical line: bad point, green (right) dashed vertical line: 
good point. For explanation see main text (Sec. 8.6.3). 


of neighbours which seem to have converged to lower energies as starting 
points in order to see if this allows to get to lower energies. This is the strat¬ 
egy that we tried next. 

8.6.3 Zigzag sweeping 

All the results for the Bose-Hubbard model (as presented in Sec. 8.5.2) have 
been obtained without the use superpositions to the right of the phase gates, 
i. e., with m = 1. Accuracy was instead improved in an iterative way using 
the following heuristics: Start minimisations from parameter vectors chosen 
at random for a variety of different values of Hamiltonian parameters (i. e., 
chemical potential }i and on-site repulsion /, for the Bose-Hubbard Hamil¬ 
tonian) within the area of interest. Once the minimisations have converged 
more or less, compare each point with its neighbours. If one point looks better 
than a neighbouring, use this point's parameter vector to start a minimisation 
for the neighbouring point's Hamiltonian parameters. 

In order to see how to do this in an objective way, look at the example of 
the red curve of Fig. 8.8. There, / = 0.02 is kept fixed and fi varies from -0.08 
to 2.7. The data points are spaced rather closely (}i varies in steps of 0.003 
up to 0.15). Hence, if we plot the energy E versus the chemical potential }i 
and zoom in to look only at a few adjacent data points, we may expect to 
see simply a straight line. If the points lie close enough, any deviation from 
linearity is less likely for physical reasons but rather due to different quality 
(i. e., proximity to the global minimum) of the approximation at the points. 
Hence, we can interpret slight deviations towards higher [lower] energy as 
an indication that the point is a worse [better] approximant than its neigh¬ 
bours. As the slope varies too little to clearly see these differences it is helpful 
to take the second numerical derivative to enhance the differences. Follow¬ 
ing the sketch in Fig. 8.12, a simple heuristic emerges: A pronounced peak in 
the second derivative means that the corresponding point is a better approxi¬ 
mant than its neighbours. Hence, use its parameter vector as initial values to 
redo the minimisation at the neighbouring Hamiltonian parameters. If one 
of the two neighbours has a much lower second derivative than the other. 
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redo only this one. Conversely, a point with a pronounced dip in the second 
derivatives should be re-done, starting with the parameter vector of one of 
its neighbours, normally the one with the higher second derivative. 

Close to a phase transition, the procedure may get stuck because the step 
from one neighbour to the next changes the state too much. In this case, one 
should insert a new data points between the point that failed to get better and 
the neighbouring point used for the initial value. 

8.6.4 Outlook to other minimization techniques 

The literature on unconstrained nonlinear minimization is vast, and finding a 
good global minimization scheme requires a lot of trial and error. Apart from 
the two-phase heuristics described above, we have also tried genuine global 
minimization techniques, namely simulated annealing [KGV83] and differen¬ 
tial evolution [SP97]. Both are genuinely global in the sense that they do not 
employ a local search stage. However, they thus cannot take advantage of the 
possibility to calculate the gradient. Hence, it is not surprising that simulated 
annealing converged much too slowly to be of use. (Simulated annealing is 
used in many different fields with much success but usually for functions 
with a convoluted potential surface but only few variables. We have several 
hundreds or even thousands of variables.) Differential evolution is a genetic 
algorithm and shows the —on first sight surprising— feature of converging 
to the mean field solution. (This seems explicable from the fact that crossing 
two genotypes in different basins has to end up in a "compromise", which is 
mean field.) 

One further possibility might be basin hopping, which is a family of tech¬ 
niques (reviewed in [WS99]) that combine simulated annealing with a local 
search phase in order to overcome the problem states in the previous para¬ 
graph. These ideas are quite recent and research is still ongoing. So far, how¬ 
ever, it seems that the basin hopping requires to perform very many local 
searches which hence have to converge fast. This is unfortunately not so in 
our case. It seems conceivable that variants can be developed that only use 
rough and hence fast local searches, and this might be a way to proceed with 
our method. 

Another ansatz is using a clustering stage in the global phase of a two- 
phase method [RT87]. This allows to make multi-start much more efficient 
but has two difficult requirements: (i) One needs to factor any degeneracies 
in the minima out of the parameter space. We have not yet studied whether 
this is possible, (ii) The number of local minima must be small enough that 
one has a decent chance to encounter all of them during the local searches. 
Unfortunately, especially the calculations for Fig. 8.11 have brought us to the 
observation that the number of minima seems to grow very fast with the 
system size. 

A further technique that we have tried is imaginary time evolution, which 
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works as follows. Given an initial state | Y(xo)) chosen at random, we can find 
an estimate |Y(x,- + i)) ~ nrme~ AfH f (x, f l )) for the discretized evolution of 
the state under the system Hamiltonian in the imaginary time direction. As 
for most initial states |Yo), Yg = Hm f _ >D o nrm e~ tH is the ground state, this 
iterative evolution should converge to a good approximation of the ground 
state. We have decomposed e~ AtH into a product of bond terms e~^ tHab us¬ 
ing standard Trotter decomposition and then tried to find the Ax E 10 that 
maximizes the overlap 

(Y (x + Ax) | e~ AtHab | Y(x)) 

0(Y(x + Ax) | Y(x + Ax)) (Y(x) | e~ 2AtH ° b | Y(x)) ' 

Unfortunately, the maximization failed to give good result even for arbitrarily 
small time steps At and we thus abandoned this approach. 

We should also mention that for the some of the results of or previous 
paper [APD ‘ 06], we (actually, M. Plenio, who programmed this part) have 
used a Rayleigh minimization technique: One restricts the energy function 
E(x) in the sense that one keeps all but a few parameters fixed. For certain 
such subsets of only a few parameters, namely for the set of parameters cor¬ 
responding to a single local unitary or to the phases and deformations for one 
pair of qubits, one can write the restricted energy function as quotient of two 
quadratic forms. This is also known as a generalized Rayleigh quotient and 
the global minimum can be find via a generalized eigenvalue problem. Such 
a "global minimum" typically is, however, not even a local minimum of the 
full energy function. The reason that we got good result for the Ising model 
in [APD + 06] seems now, in retrospect, have been due to the extraordinarily 
benign form of the corresponding energy landscape. Hence and because the 
scheme cannot easily generalized to spins higher than 1/2, we did not pursue 
this any further. 


8.7 Conclusion and outlook 

To conclude, we have presented a class of variational states that holds 
promise to approximate the ground states of spin systems and bosonic 
systems. The advantageous properties of this class is that it includes 
states with an arbitrarily high entanglement and the possibility to adapt to 
arbitrary geometries and number of spatial dimensions. We have shown 
how to calculate expectation values of observables for these states and 
demonstrated the approximation of the ground state for two model systems, 
namely the XY spin-1 /2 model and the Bose-Hubbard model, in one and two 
dimensions. Furthermore, we have explained heuristics suitable to drive the 
minimization. 

The method works for small systems and maps out the rough structure of 
phase diagrams. (The system sizes, though small, were sufficient to see the 
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phase boundaries even though phase boundaries are defined, strictly speak¬ 
ing, only in the thermodynamical limit.) We can calculate observables for 
states in systems of considerable size but have problems in approximating the 
ground state in larger system to precision sufficient to see actual differences 
between different system sizes and hence to do finite-size scaling studies. 

It seems likely that this is not because there were no states in our varia¬ 
tional class which were close enough to approximate such ground states well. 
Rather, we simply cannot find them because our minimization gets trapped 
in local minima. Can the avoid this? This is the crucial question for the fu¬ 
ture development of the scheme, and at this moment, we may only offer some 
thoughts on that: It seems unlikely that the choice of another generic global 
minimization algorithm is able to steer around these local minima better that 
those algorithms that we have tried. For further progress, it seems hence 
desirable to have a better understanding of the shape and structure of the 
manifold Y(J0) C H, i. e., our variational set of states as described by the 
mapping from the parameter space. Is, for example, this manifold "folded" 
more and curved stronger than the equi-energy surfaces of typical system 
Hamiltonians? This might explain, why there are so many local minima — 
and getting a better grasp on topology and metric of the mapping Y and its 
image could be most helpful in finding a better way to steer towards good 
minima. 
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8.A Appendix A: Notes on the implementation 

8.A.1 Avoiding overflows 

A certain detail is worth mentioning as it may cause some difficulty in the 
implementation: As the product (8.15) contains 0(N ) terms, its values grows 
exponential with the system size N. Even for factors which are quite close to 
1, the value will leave the range of floating-point arithmetics (on most com¬ 
puters, ca. 1(T 308 ... 10 308 ) for even moderate values of N. To avoid this, one 
has to compute the product by summing up the logarithms of the matrix ele- 


0 


0 


0 


0 










0 


0 


0 


0 


8.A. Appendix A: Notes on the implementation 


175 


jlc 

merits of p ab c , then subtracting a constant from this sum, and then exponen¬ 
tiating the result component-wise. The subtraction of the constant does not 
change the final result, as it formally cancels against the final normalisation 
to unit trace. The exact value of the constant is hence irrelevant, but it has to 
be chosen large enough to avoid a floating-point overflow during exponenti¬ 
ation, but not so large that the elements of all the matrices p^ h vanish due to 

jk 

floating-point underflows. (That some elements of some of the matrices p ab 
suffer an underflow is, however, unavoidable, but harmless, as their contri¬ 
bution to the result is evidently insignificant.) Especially for large systems, 
the constant has to be readjusted during the minimization. 

8.A.2 Choice of programming languages 

We have written two implementations of our algorithm. The first one, called 
"ewgs" is specialised for spin-1 /2. It was used for the results on the XY model 
(Sec. 8.5.1), and also for the results presented in [APD+06]. 10 The other, more 
recent program is called "hwgs" and may be used for spins of any size n. 
"ewgs" is mainly written in C++, only the outer drivers are written in Python. 
Python [R ] is a very modern, quite powerful scripting language, that fea¬ 
tures high-performance just-in-time compilation, an exceptionally compre¬ 
hensive low- and high-level library, an open-source license and excellent 
inter-platform portability. The development of a numerics library for Python 
has reached maturity quite recently with the release of NumPy [Num, Oli06]. 
Due to the higher level of the language, development is much faster in Python 
than in C++. This makes it advisable to do most of the coding in Python 
and only write the "hot spots", i. e., the proverbial 10% of the code in which 
the processor spends 90% of the time, in an optimizing compiled language 
such as C++. This approach, though it may sound unusual to a tradition¬ 
ally oriented computer physicist, has been used in several places with much 
success (see e. g. the advocacy in [BCG05]), and from our experiences, we 
clearly recommend its use. Hence, for our second implementation, "hwgs", 
we followed this paradigm consequently and wrote only a small part in C++. 
This part was bound to the main Python code using SWIG [B + ], For the lo¬ 
cal minimizer we used in both implementations the L-BFGS-B Fortran code 
[ZBFN97], linked to Python with the help of the tool f2py [Pet], 

8.A.3 Performance 

The performance of the "hwgs" implementation can be seen in Fig. 8.13. The 
blue curves shows the time required to calculate energy and full gradient 

10 For completeness, we should point out one difference between the description in this 
article and the implementation: In "ewgs", the unitaries are not parametrised using the 
Cayley transform, but rather as linear combination of the identity and the Pauli matrices: 

. , _ uol+uia x +u z cry+U3cr z 
U0+I(j+ U 2+ U 3 ’ 
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Figure 8.13: Performance of our "hwgs" implementation: For a Bose- 
Hubbard system on a square lattice of varying size, the calculation time for 
a single reduced density matrix of a pair of sites (red diamonds) and for the 
full gradient of the energy (derivatives w. r. t. all parameters) (blue squares) 
is shown. These calculations have been done for states with n = 5 levels per 
site and no superpositions (m = 1). The program was run on AMD Opteron 
machines clocked at 2.2 GHz. 


for one parameter vector at various system sizes. In order to see the time re¬ 
quired to find a good approximant, this has to be multiplied with the number 
of function evaluations needed by the minimiser. 

Usually, one wants to find approximants for several different values of 
the Hamiltonian parameters. Then, one can save much time by running these 
minimisation in parallel if one has access to a computer cluster. 

8.A.4 Availability 

We would welcome to see our code been used in further projects. Hence, 
researchers who are interested in applying our code in their own projects are 
encouraged to contact the authors. 

8.A.5 Density plots 

The plotting technique used to obtain Figs. 8.4 and 8.7 merits a brief expla¬ 
nation. For these plots, we calculated the plotted quantity at different value 
pairs for the quantities at the x and y axes. In order to work out interest¬ 
ing feature, we did not evaluate at a fixed grid but rather started with some 
loosely spaced points to get an overview and then added more and more 
points at regions with interesting features. This allowed us to "explore" the 
parameter plane. However, it leaves us with a list of data points at irregular 
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positions, which makes the usual 3D mesh plots unsuitable (as a mesh plot 
requires data from a regular grid). This is why we visualize the data instead 
with density plots, using colour to indicate z height. To obtain the colours we 
interpolated between the data point, and for this, we experimented with two 
interpolation algorithms, namely Akima's spline method [Aki96] and the Sib- 
son's natural neighbours method [Sib81]. As the former has problems with 
strongly varying curvature (and this is the case here: the data varies more 
strongly near the phase transition than in the interiors of the phases) we used 
Sibson's method and produced Figs. 8.4 and 8.7 with the help of the NATGRID 
implementation [Cla04] of Sibson's algorithm. 


8.B Appendix B: Calculating the gradient of the 
energy with respect to the parameter vector 

For use in the gradient-based minimisation we need a fast way to obtain the 
gradient of the energy function £(x). For the following, we assume that the 
Hamiltonian can be written in bond form, 

»= e 

(«,6) eB 


As before, B is the set of bonds, i. e. of pairs of interacting spins. In many 
cases H a b is the same for all bonds ab, but having an inhomogeneous Hamil¬ 
tonian is no complication. 

As the energy function is given by E(x) = YL(a,b)eB h" HabP(a,b)r its gradient 
consists of a sum of derivatives of the reduced density matrices 


9E(x) 

d%i 


E trH ^ 

(8,6) eB 


dpnb 

dxi 


We shall now derive formulae for the components of the gradient, i. e., for 
the derivatives w. r. t. the different kinds of parameters. 


8.B.1 Derivatives w. r. t. the parameters for the local unitaries 

The derivative of a matrix exponential w. r. t. the components of the exponen¬ 
tiated matrix (or of linear combinations of these) is a very involved problem. 
Not only is the integral representation of this parametric derivative, though 
simple, in no way obvious, but also is the evaluation of this integral a very 
non-trivial matter. For a review of the history of this problem and current 
state of knowledge, consult Ref. [NH95]. 

For us, this is the main reason why we do not use the exponentiation of a 
Hermitean matrix for the parametrisation of the local unitaries, but rather the 
Cayley transform of it, for the latter involves only a matrix inverse, whose 
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parametric derivative is expressed by a simple formula: For any invertible 
square matrix A = A(f) that depends differentiably on a real parameter t, we 
have 


dTT 1 

df 


df 


(8.23) 


(for a proof, see e. g. [P1M]). 

We need n 2 real parameters to parametrise a Hermitean n x n matrix, 
which we arrange to form an xn upper triangular matrix A with real entries 
in the diagonal, complex entries in the upper triangle and zeroes in the lower 
triangle. A = A + A + is now Hermitean and 


U = (it + A + A + zl — A — A 


-l 


(8.24) 


is unitary. Using Eq. (8.23), we get 
dU 


’{£} 

9 U + 


A 


kl 


’{£} 


A 


kl 


= {;} (1 + u) ( |l> < f| { - } |(> (fl ■ /1r '- < 8 - 25 ) 
= -{)}(fl + ^)-'(|t> </|{ + } ID [k\j (i + u*) 


9 U 




We use this to calculate 
9 E 


9 A 


^ E trH > 

a ' kl b:(b,a)eB 


ba 


3LT \ 

U h ® )pba{U h ®U a f + 

° A a,kl J 


“I - (Uf, (g U a ) pi, a (14 


dU a 
9 A 


a.kl 


+ 


+ E trH « 


, , „ x 9A„ 

c:(a,c)eB ' a 


r)l T \ 

® Uc ) pac (U a <g> U c y + 
'■™-a,kl J 


~\~ {Ua ® U c ) pa 


9 U a 
9 A 


U 


a.kl 


where pi c is the reduced density matrix without application of the local uni- 
taries, i. e. (0 = (14 <g> U c ) f pb c {Ub <g> U c ). 


8.B.2 Derivatives w. r. t. the deformation parameters 

For the derivatives w. r. t. the parameters Re d[ s and Im d[ s , we have to take 
care of the normalisation of p a j, as it depends on those parameters. We abbre¬ 
viate the middle line of Eq. (8.12) with p a b and start with using Eq. (8.19) in 
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order to see that 


p ab = £ ocjOclD^Qp’^ 
i *=i 


where (according to Eq. (8.20)) 

V k - 


r,r' 6 S 


We write the derivative as 


dpab 


»{£}4 


= {U a © U b )W(p ab 


Pab 




tr pab 


W-< Pab (U a ®U b )\ (8.26) 


where the term in parentheses becomes 

dpab 


>{£} 


dl. 


tr Pab~ Pab tr 


dpab 




(tr pabY 


(8.27) 


In order to evaluate dp ab /d j ^ | d[, we distinguish three cases, namely (i) 
c = a, (ii) c = b, (iii) c £ {a, b}. 

Case (i): The only term in the middle line of Eq. (8.12) that depends on d l a 

v i’k 

is D ab and this only for those terms in the sum, where j = l or k = Z. One 
finds 

dpab 


■ \*l E 4 E d brAr[ d U M < r 10| © Pab + Hc - 


9 {lm} d «s *=i n,r\/ 2 es 

Case (ii): Analogous: 


f S & , = \ ■ \*1 E < E ^rAr', d W 2 M (02 | © pi + H <© 

9 {lm }4 1 J 00 S 


Case (iii): For c i {a, b}, D ! 'f is independent of d[, but p^ b is now depen¬ 
dent. We get: 


dpab _ f 1 ] Aa.*f) lk iA 

0}0W E ‘ * 


k= 1 


© 4 (exp i ( 0 / - 0 / + ^ 

where pf is given by Eq. (8.18). 


, eSJ ® O rf + H.c., 

eeV\{a,b,c} 
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8.B.3 Derivatives w. r. t. the superposition coefficients 

The derivatives w. r. t. Re a/ and Im «/ are found the same way as for the de¬ 
formation, and one gets 


dpab 


^Pab 


-> ~ Re \ Pab Pab tr , R ” 

dnrmp ab _ 3 (i m / a 3 (im 


doc 


ah 


(tipabY 


with 


where 


dpab 


>{£}«< 


= £ 


k =1 


D 


(a,b) 


DJ is defined in Eq. (8.20). 


D 


(a,b) 


© p l Yb + H.C., 


8.B.4 Derivatives w. r. t. the phases 

We first introduce 


i®V2 


Wq> = |<f>) (0| with |<5>) = Y2 |r) e' 

res 2 

so that we can write —due to Eq. (8.21)— Eq. (8.12) as 

Pab = © iffe) (Wo„(, © Pab) (Ida © Ufc)- 

We have to take care that in case of a symmetrisation according to Sec. 
8.3.4 a phase can occur more than once in the expression for p ab , and hence, 
we make use of the phase index mapping v(a, b) E {1,..., R} (Sec. 8.3.4), that 
associates with every pair of spins a, b a phase matrix We write (with 

r = (r© 2 )) 


dpab 


tr Pab 


( ld a © LZfc) 


'9W 0 


mf 2 - ® ® ® u »y 


and proceed to discuss the two derivatives in this expression. 

The first one is evidently non-zero only if v = v(a,b) and then evaluates 
to 

aw 0 


v v(a,b) 


30 r , m 

v{a,b) 


~ © 




r i} ^W v q' 2 } ,{n,r 2 } 


q,q'€S 2 


The set notation in the Kronecker deltas accounts for the fact that <t> is sym¬ 
metrised, < t>' 1 ' 2 = O'’ 2 '’ 1 , (cf. again Sec. 8.3.4) and hence, the order of the com¬ 
ponents of the vectors r, q, and q' must be disregarded. 

For the second term, we pull the derivative inwards 

s ^= £>,< 0*0 ^ 
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then rewrite Eq. (8.15) as 

A= O E&4'|*W<^ 


,S 

abc 


with 


and continue 


ceV\{a,b} s SS 


\V*c) = L l £ ?i ( ? 2 )exp 
qi,q 2 eS 


1 v\a,c) v[b,c) 


A 

9<E> r 

v eeV\{a,b} \ceV\{a,b,e} seS 


= E O E^'I^X*: 


abc I 


0 


0 Yj “ esl *es 
seS 


d j dk * d\& abe )(&abe\ 




The sum over e formally runs over N — 2 terms. Most of these vanish, how¬ 
ever, namely all those for which neither v = v(a, e) nor v = v(b, e). For 
translation-invariant phases, the number of remaining terms is of the order 
of the coordination number of the lattice. 

The derivative in the last line of the previous equation evaluates to 
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Chapter 9 

Tensor networks 


This chapter is concerned with a class of states known as tensor network 
states and a sub-class thereof, the tensor tree states. These states are a gen¬ 
eralization of the matrix-product states discussed in Sec. 6.4, and we hence 
continue in the context laid out in Sec. 6.4.4. 

We shall start with a precise definition of tensor network states and then 
concentrate on the case where these networks are trees. In Ref. [MRS02] (see 
also [Rod02]), a scheme called "DMRG for Trees" is used to study excitons on 
dendrimers. 1 While their ansatz required that the interaction pattern of the 
studied system has the form of a tree, the idea can be generalized to arbitrary 
systems, as shown in the "tensor tree network" ansatz of Shi et al. [SDV06]. 
As we shall see, the advantage of tree states is that their expectation value for 
any product observable can be calculated exactly by means of a "contraction", 
in contrast to states with an underlying mesh structure such as the projected 
entangled pairs states discussed in Sec. 6.4.4. Markov and Shi [MS07] have 
studied when a tensor network can be contracted exactly and have given a 
criterion on how far one may stray from the tree shape without losing this 
ability, but we shall not pursue this point. 

The work presented in this section is (yet unpublished) work done in 
collaboration with Robert Htibener, Caroline Kruszynska, Lorenz Hartmann, 
Wolfgang Diir, and Hans J. Briegel. The presentation in this chapter focuses 
on my parts in this work and does not go into much detail of certain aspects 
investigated by my colleagues. 

9.1 Definition 

Let us first define the notion of tensor networks in a very general form: A 
tensor network is a connected (mathematical) graph such that with each edge 
{a, b } is associated a positive integer which we call the edge range, and 
with each vertex a of degree ?/ > 1 a complex-valued tensor T-"l of rank ?/ 

1 For a review on dendrimers, see [InoOO] 
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is associated, where the ranges of the ;/ indices of T^l are given by the edge 
ranges. This means that for a vertex a, which is connected to the vertices 
bi,b 2 ,..., by, the ranges of the indices of are 

0,...,x(a,h) 0,.. .,x(a,b 2 ) — 1, 0,.. .,x(a,b v ) - 1 

and T[“l has nf = i T0 bf) complex entries. Only the vertices with degree )/ > 
1 have associated tensors and these are called the tensor vertices, while we 
call the vertices with degree 1 the leaf vertices. An edge between two tensor 
vertices shall be called an inner edge, one leading to a leaf vertex a leaf edge. 
Note that an inner edge connects two tensors, and each of them has one index 
corresponding to this edge. The ranges of these two indices are the same 
(namely the edge range), and hence, we may contract along such an edge. 

A tensor network describes a quantum state by means of the following 
convention: With each leaf vertex a we associate a d-level quantum system, 
where d is the range of the edge incident onto a. (Remember that leaf vertices 
do not carry tensors. Hence, the edge range of the edge incident on the leaf 
vertex may be chosen such that it matches the dimensionality of the system 
we wish to describe, and the tensor at the other end of the edge then needs 
to have its corresponding index to have the same range.) Thus, if we have a 
tensor network with M tensor vertices with associated tensors T^,..., T^ M - 
and N leaf vertices ai,...,a^ with corresponding edge ranges d\,..., d^, we 
describe a quantum state living in the Hilbert space 

N 

H = ® <0. 

i =1 


This state is now formally written as 


d i—l djsi 1 . . 

|TN) = £■■■ E r(TN,... t TM ;Slt ...,s N ) |si,...,s N ). (9.1) 

si=0 s N =0 


The operator T carries out a contraction as follows: Each leaf vertex a, is con¬ 
nected by a leaf edge to one tensor vertex b with tensor T0 and this tensor 
has an index corresponding to the leaf edge. All such indices are fixed to the 
values Sf. The remaining indices of all the tensors correspond to inner edges, 
over which a contraction is carried out. Carrying out these contractions will 
result in a single scalar, namely the coefficient (si,..., s,v j TN) of the compu¬ 
tational basis expansion of |TN). 

An example should make this clearer: Consider the tensor network in Fig. 
9.1. It corresponds to the state 


di-l d 5 -1 0 2) X [2 ^ 

1^) = E ••• E IHS2S3S4S5) e E 

Sl=0 S 5 =0 a [l,2] =0a [2,3] =0 


yM yP] yP] 

S1S2al 1,2 ! S3ttl 1 ' 2 ]^! 2 ' 3 ] S4S5AA 3 ] ’ 
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[ 2 ] 



Figure 9.1: An example for a tensor network. Here, the graph is a tree. 
Squares are leaf vertices and circles represent tensor vertices, both labelled 
with their respective vertex indices. For tensor vertices, we put square brack¬ 
ets around the index. The edges are labelled with their edge ranges, where 
we use the convention to write x f° r inner edges and d for leaf edges. 

9.2 Tensor trees 

In this section we explore how tensor trees can be treated numerically, as this 
is the ansatz that is currently being researched in our group. 

The graph of Fig. 9.1 is a tree, ie., a connected graph without loops. This 
special structure allows us to calculate for the associated state | tp) exactly and 
efficiently the expectation value of any observable of the product form 

N 

0 = 0 Of. (9.2) 

i =1 

Consequently, the same is true for sums of such product observables. Staying 
with the example of Fig. 9.1 we write (using now a. for at 1,2 ! and /I for at 2,3 ! to 
make the equation more readable) 

d\—1 

010 0)= E 

Sj,si=0 


At first glance, this expression is a sum over exponentially many terms. 
However, we can rearrange terms by splitting the tree into subtrees. For ex¬ 
ample, if we split along the inner edge [2,3], the sums over ft' and ft run over 
products of two factors, each of which corresponds to one subtree and can be 
calculated independently of the other: 

xM-i 

0100)= E S [2] (i B',ft)B®(ft',ft). 

W=o 


d$—1 


e (n < s /1 °f i s f) 


S5,S5=0 V=1 / 

rl 1 ' 2 ! d 2 ' 3 ! 

y-i y y[l]* y[l] yp]* y[2] y[3]* y[3] /q 
A i s's'a' is i s 2« i s'a'B' i s 3 a^ i s's'fl'- 1 S4S5^ 

ol ' a =0 j3',j8=0 
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Here, the factor corresponding to the subtree below tensor vertex [3] is 

d/± — 1 d $— 1 

*40 1 S 4 S 5 ^ 


B |3| ()3',«= E E T ffi» T S t »W|04|s4)<4|05|s 5 ! 


S4,S 4 =0 Sg,S 5 =0 


and the factor for the subtree below tensor vertex [2] is again divided up into 
factors corresponding to the subtree below tensor vertex [1] and to spin 3 


B |2| (?',/S)= A E 1 E 1 I °3 I S 3> , 

u',c i=0 Sj,S3=0 


and the last subtree is evaluated as 

d \—1 d2 —1 


B [1] («',«) = £ £ T i' 4 «' T S 2 « < S 11 Oi I Si) <4 | 0 2 | s 2 > • 

Sj,Si=0 S2,S2=0 


In practice, we have to work from bottom to top: We first compute fiW, 
from this £>P] and combine the result with B - 3 to get (if) | O | if)). 

Let us define a tensor tree state to be any state associated with a root-less 
binary tree graph. By a root-less binary tree we mean a loop-less graph (such 
as the one in Fig. 9.1) with all vertices being either of degree 3 (tensor vertices) 
or 1 (leaf vertices). 2 Then, there are as many B terms as there are tensors, and 
in order to calculate one of them, we have to carry out a quadruple sum of 
0(x 4 ) terms, where x is the maximum of all the Xi and dj. Since a root¬ 
less binary tree with N leaves has N — 2 tensor vertices, the time to calculate 
an expectation value of a product observable for a tensor tree state scales as 
0(Nx 4 )- These facts have been pointed out first in Ref. [MS07] and more 
explicitly (though using another notation) in Ref. [SDV06]. 

To implement this contraction numerically in an elegant fashion, one 
should use recursion: 3 First, we choose a root edge, i. e. an edge through 
which to make the first split. In the example above, we had chosen [2,3] as 
root edge. Then, we call the function calculate_observable, which is given 
in pseudo-code in Table 9.1. This function then calls calculate_vertex_value 
(Table 9.2), which takes three arguments, a vertex index b and two edge 

2 In graph theory, a graph is called a tree if it is connected and has no loops. It is called a 
binary tree if each vertex has either degree 3 or degree 1 (the latter are called leaves) except for 
one vertex that has degree 2 and is called the root. By convention, "trees grow downwards", 
i. e., they are always drawn with the root edge on top and the leaves at the bottom. Hence, a 
tree in graph theory does not have a height as natural trees do, but a depth, which is the maxi¬ 
mum distance that a leaf has from the root. In our case, it turns out to simplify the formalism 
if one does not have a root vertex, and hence, our trees are "root-less". An alternative term to 
denote this kind of tree is Cayley tree of degree 3. (A Cayley tree of degree d is a tree that only 
contains vertices of degree d or degree 1.) 

3 A reader unfamiliar with the standard algorithms and techniques to treat trees in com¬ 

puter programming might find the relevant chapters of the classic textbook [Sed88] helpful. 


0 - 


© 


0 


0 











0 


0 


0 


0 


186 


9. Tensor netivorks 


function calculate .observable (root.edge ( bj,b r )) 
v <— 0 

for i' from 0 to 0b/, b r ) 

for i from 0 to y (b/, b r ) 

v <— u + calculate_vertex_value ( bi,i',i ) x 
x calculate vertex value (b r , i', i) 

end for 
end for 
return v 
end function 


Table 9.1: The function calculate.observable in pseudo-code notation. 


indices i', i. Its purpose is to calculate, when called for a tensor vertex b, the 
value B [ ,1 - (z 7 , i) (by means of recursively calling itself), and when called for a 
leaf vertex b, the value (i 1 \ Of, \ i). 


9.3 Other variants of tensor networks 

Since Ostlund and Rommer's realization of the connection between DMRG 
and matrix product states (see Sec. 6.4.2), a host of variations on this theme 
have been studied, as we have already seen in the overview given in Sec. 
6.4.4. All these state classes can be conveniently described in a unified fash¬ 
ion in the language of tensor networks. We shall see that they fit into the 
general form of the definition given in Sec. 9.1. However, not for all possible 
tensor networks, observables can be calculated efficiently; we require a tree- 
structure for the scheme described in Sec. 9.2 to work, and a deviation from 
this tree structure is only possible up to a certain extent [MS07], unless one is 
willing to use approximative techniques. 

9.3.1 Matrix product states 

It is easy to see that matrix product states fit into the definition of a tensor 
network state: While the quantity A[ n l s " associated with a site a in Eqs. (6.1, 
6.2) is usually seen as a tuple of d matrices, each of size y x y, one can, of 
course, as well consider A^ a 1 as one tensor of shape y x y x d. (Note that we 
now use y to denote the quantity called D before.) Then, the matrix multipli¬ 
cation is exactly the same operation as the contraction T associated with the 
graphs depicted in Fig. 9.2. 

In a matrix product state with periodic boundary conditions, all tensors 
have rank 3, and consequently, all tensor vertices in Fig. 9.2b have degree 3. 
In matrix product state with open boundary condition, however, the tensors 
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function calculate_vertex_value (vertex b, index i', index i): 
if & is a leaf vertex then 
return ( i' \ O}, \ i) 
else (i. e., if b is a tensor vertex) 

if, during the calculation for the current observable, this function 
has already been called with the same arguments as now 

then 


return the values calculated last time by looking 
them up in the cache 


else 


Denote the 3 vertices adjacent to b as follows: 

The vertex closest to the root edge is b U/ the other two 
vertices are (arbitrarily but consistently) bj and b r . 

In the following, read the tensor elements Tptj r such that the 

first index (p) corresponds to the edge to b u , the second one 
(q) to the edge to bj and the third one (r) to the edge to b r . 


v <— 0 

for / from 0 to x(b, bj) 

for j from 0 to x(b, bf) 

for k' from 0 to xif,b r ) 

for k from 0 to x(b,b r ) 
v^v + 


ijk 


x 


x calculate_vertex_value 
x calculate_vertex_value 


(bnf'j) x 

(b r ,k',k) 


end for 
end for 
end for 
end for 

Store v in the cache as result for the arguments (b, i', i). 

return v 
end if 
end if 

end function 


Table 9.2: The function calculate_vertex_value in pseudo-code notation. 
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Figure 9.2: Tensor networks associated with a 5-spin matrix product state 
(spin dimension d, matrix dimension X x X) with (a) open and (b) periodic 
boundary conditions. As in Fig. 9.1 before, squares indicate spin vertices 
(indexed with plain figures), and circles tensor vertices (indexed with figures 
in square brackets). Edges are labelled with their edge range. 


at the end have only rank 2 and shape x x d. In the original formulation of 
Eq. (6.1), this has been reflected by writing the quantities controlling the end 
spins as d-tuples of ^-dimensional vectors. 

We see immediately that the open-boundary matrix product state is a ten¬ 
sor tree state, quite similar to the tree in Fig. 9.1. The difference is that the 
lowest tensors in Fig. 9.1 each have two leaf edges while in Fig. 9.2b, they 
have one leaf edge. 

Note that we can efficiently evaluate an observable for a periodic¬ 
boundary matrix product state (using Eq. (6.3)) although the associated 
graph is not a tree, and the presence of a loop makes a contraction with the 
algorithm of Sec. 9.2 impossible. This is because a contraction is still possible 
even in the presence of loops in a limited sense. This is explored in detail in 
[MS07], 

9.3.2 Projected entangled-pair states 

In [VC04a] Verstraete and Cirac have proposed a way to generalize matrix 
product state to the 2D case inspired by the idea of valence-bond states 
(whose possible uses for quantum information they had studied before 
[VC04b]). Fig. 9.3 shows such a so-called projected-entangled pair state 
(PEPS) on a periodic 2D square lattice. Each tensor (depicted in the figure, as 
before, by a round bullet) now has five indices: One index specifies the spin 
level (represented in the figure by the edge to the square leaf) and hence runs 
from 0 to d — 1, the other four indices connect to the four neighboring tensors 
and run from 0 to x ~ 1, where x is, as before, the edge range (denoted D in 
Verstraete and Cirac's papers). As in the matrix product states, each tensor 
controls precisely one spin to which it is directly connected. Note that this 
is not the case for the tensor tree states. The one-to-one correspondence of 
tensors and spins allows for an alternative understanding of the constructing 
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Figure 9.3: Tensor networks associated with a PEPS on a 4 x 4 lattice with 
periodic boundary conditions. 



Figure 9.4: Construction of the same state as in Fig. 9.3 in the valence bond 
picture. 
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of the states, namely in the picture of valence-bond states: 4 Each bond of the 
square lattice is modelled by a pair of maximally entangled states of Schmidt 
rank x (see Fig. 9.4). At each site, four such auxiliary y-level spins can now 
be found, each being one end of the pairs forming the four incident edges. 
The tensor 01 associated with a site a is seen as a projector 

d-l X , , 

E E ^■sr 1 r 2 r 3 r i |s) 0^30 

s=0 r\,r 2 , 

f 3 ,''4=0 


that maps down from the four-spin auxiliary space (C*)® 4 to the space C v/ 
of the physical spin. (In the case of open boundary conditions, the tensors 
at the lattice edges have one index less, and at the corners, two indices less.) 
The value of the construction is that it immediately gives an upper bound on 
the block-wise entanglement of a PEPS with respect to any bipartition. As a 
maximally entangled state of two y-level systems has an entropy of entan¬ 
glement of log 2 X' each bond that crosses the boundary area of the bipartition 
can contribute this amount to the block-wise entanglement. Hence, the en¬ 
tanglement with respect to any bipartition of the lattice that a PEPS can have 
is bounded from above by log 2 X times the number of bonds that cross the 
boundary area, and this bound can be saturated. As we shall see in the next 
section (Sec. 9.4), this property is crucial for making the PEPS a class of states 
suitable for 2D and 3D systems. 

This advantage is bought by a considerably more complicated procedure 
to evaluate observables. Because of the many loops inherent in the mesh un¬ 
derlying a PEPS any straight-forward contraction in the sense of the proce¬ 
dure of Sec. 9.2 (or the more general procedure of Ref. [MS07]) is not possible. 
A simple way to contract the network is as follows: If one contracts an edge, 
two tensors, which both have further indices, will be joined to one tensor, 
which now may have up to 10 indices. 5 The tensor network now contains 
two triangles, and contracting an edge of such a triangle cause the other two 
edges to come to lie on top of each other, i.e., the two edges of rank x be¬ 
come one edge of rank Then, one truncates the two tensors joined by 
this edge by means of a singular value decomposition in order to reduce the 

4 This class of states were generalized from the expression used to describe the so-called 
valence-bond solid phase of certain antiferromagnetic spin systems [AKLT87], They were 
used as a conceptual aid to understand the entanglement properties of matrix product states 
in [VPC04], and are also handy as an alternative construction principle for graph states, cf. 
Sec. 5.2.1. 

5 When calculating the expectation value {xp\0\ip) of an observable O, each tensor appears 

twice, once in the ket \ifj) and once as its complex conjugate in the bra (ip\. When carrying out 
the double sum associated with the bra and ket states of the spin connected to the tensor, the 
edge range of the inner edges rises from x to X 2 - Alternatively, if one starts by contracting an 
inner edge between two tensors first, one is left with a tensor with six inner edges (each of 
the original tensors has three further neighbors) and four leaf edges (the spins of both tensors, 
each as bra and as ket spin). This results in a tensor with x 6 ^ elements. 
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edge dimension to X- I n this way, one contracts the whole network, always 
using singular value decompositions to truncate any edge which has been 
"blown up". However, as numerical experiments done by V. Murg and I. 
Cirac [pers. comm.] have shown, this procedure is not only computationally 
rather expensive but also and especially fails to reach satisfactory precision. 
The approximation error due to the repeated local truncations adds up to fast. 

Hence, they have devised a more sophisticated scheme [VC04a, MVC07], 
which, however, works well only for open boundary conditions. In an L x L 
square lattice, one contracts over all the vertical bonds between the first and 
second row of spins. This causes the horizontal edges of these rows to merge 
to edges of range x 2 - Now, a PEPS state of size (L — 1) x L, consisting only of 
bonds with range x is sought that approximates this state in an optimal way. 
The tensors of the L — 2 upper rows of the new state stay the same as before, 
the tensors of the now lowest row are optimized, by a horizontal DMRG-style 
sweeping, to minimize the distance between the two states. Then, this row 
of newly calculated tensors is again contracted with the tensors above and 
this is iterated until the whole network is contracted to just a single row. The 
advantage is that due to the DMRG sweeps, a much better approximation of 
the true expectation value is achieved. The disadvantage is that the neces¬ 
sity to start at an edge seems to preclude the treatment of periodic-boundary 
conditions. This is a rather serious impediment as it renders extrapolations 
to the thermodynamic limit much harder, despite the otherwise good results 
of the method in open-boundary cases. 


9.4 Tensor networks and area laws 

The scaling of block-wise entanglement with block size is a hot topic of cur¬ 
rent research. The starting point of this line of research lies in a rather differ¬ 
ent realm of physics: in black holes, which are objects of maximum entropy 
in general relativity, the entropy does not scale with the volume of the black 
hole, but with the surface area of its event horizon. This was first conjectured 
in a seminal article by Bekenstein [Bek73], and later proved more rigorously. 
Such a scaling of entropy-like quantities is, however, not constrained to such 
extreme cases, as was seen considerably later. 

For a quantum system in a pure state with a bipartition into sub-systems 
A and B, the blockwise entanglement with respect to this bipartition is de¬ 
fined as Sa-.b = tr Pa log 2 Pa = tr pg log 2 pg, where Pa ( Pb ) is the reduced 
density matrix of the system state obtained when tracing over the degrees 
of freedom of sub-system B (A). Remarkably, in the ground states of several 
very general classes of quantum systems, this quantity is not proportional 
to the volume of one of the sub-systems but to the area of the boundary 
of the bipartition. Such an area law was first proved for harmonic oscilla¬ 
tors in a chain [AEPW02] or on a lattice [PEDC05], then shown to be true as 
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Figure 9.5: A natural way to connect spins on a 4 x 4 lattice to a degree-3 
Cayley tree. 

well for a large class of non-critical bosonic system in arbitrary dimension 
[CEPD06]. For fermions, a logarithmic correction occurs [CEP07, Wol06]. In 
critical systems, entanglement may grow faster —by a logarithmic factor— 
than the boundary area of the block (studied for chains in [Kor04, KM05] and 
for higher dimensions in [BCS06]). A proof of an area law at finite tempera¬ 
ture has recently been found, too [WVHC07], 

These results are very important when assessing whether a given class of 
variational states is suitable to approximate well the ground state of some 
system. As even for non-critical 2D systems the blockwise entanglement 
grows linearly with the boundary of the block, it is imperative to require 
that the variational trial states do the same. The PEPS class fulfills this re¬ 
quirement, and the valence-bond picture described in the previous section 
(see Fig. 9.4) shows this in an intuitive way: If a bipartition of a 2D lattice is 
chosen, the number of lattice bonds that the boundary crosses is proportional 
to its length. We require the blockwise entanglement to scale linearly in the 
length of the boundary, too, and, in fact, this can be done: The maximally 
entangled auxiliary state associated with each crossed bond contributes an 
amount of up to log 2 X to the entanglement, and these contributions add up. 
Hence, the PEPS states are well suited to approximate ground states of all 
systems for which the area law holds [VC04a] and even other systems than 
can be reduced to such cases [VWPC06]. 

We may now attempt to use tree tensor states to describe systems on 2D 
lattices. A natural way to connect the spins on a 2D lattice to a degree-3 Cay¬ 
ley tree is shown in Fig. 9.5. A bipartition of the lattice along the vertical 
median divides the tree into two parts connected by just one edge. It is easy 
to see that, for any system size Lx L and any way of building the Cayley tree, 
there is always a bipartition of boundary length at least L that cuts the lattice 
such that only a single edge connects the two parts. The blockwise entan- 
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glement of the tensor tree state with respect to such a bipartition is bounded 
by log 2 x, for reasons analogous to those discussed in the preceding section. 
This bound does not grow with L while the blockwise entanglement of most 
ground states is expected to do so. In order to keep up, x has to be increased 
exponentially with LI It is thus unlikely that tensor tree states alone are a 
viable variational class for two or higher dimensions. 

A way out of this dilemma may be to add additional features to the class 
of states that allow for stronger entanglement. As we shall see in Sec. 9.6 it 
is possible to let weighted phase gates of the same kind as discussed in Ch. 
8 act on all pairs of spins without losing the ability to efficiently calculate 
expectation values of observables. But first we discuss how to optimize the 
tensors of a tensor tree state (without weighted phase gates) such that the 
energy is minimized. 


9.5 Optimizing tensor tree states 

We now want to use tensor tree states as test states for a variational method, 
i. e., for a given system Hamiltonian H and a tensor network T describing a 
state |Y(T)), we want to find those values for the tensor entries that minimize 
the energy 

pr(r)|H|T(r)) 

(T(T)|T(T)) ' 

Note first that |Y(T)) is a valid state for any values of the tensor entries unless 
certain linear dependencies (to be discussed later) render the state to be the 
zero vector. However, |Y(T)) cannot be expected to be normalized. 

As the general dependency of the energy on the individual tensor entries 
may be complicated, we may not expect that finding the global minimum is 
straight-forward, efficient or possible at all. Fortunately, it turns out, that if 
one holds all tensor values fixed except for those in one tensor, the map of 
these values to the energy takes the form of a so-called generalized Rayleigh 
quotient, which can be globally minimized by solving a so-called generalized 
eigenvalue problem (see e. g. [NW99]). 

To this end, let us pick out one tensor, T^l and see how the expectation 
value of a product observable O of form (9.2) depends on the entries of 01 
if all other tensors are considered fixed. Let us write b\,b 2 ,b^ for the three 
neighbors of the tensor vertex a, and Xi = x( a >bi) ( z = 1/2,3) for the edge 
ranges of the three edges. Generalizing Eq. (9.3), we see that each tensor is 
associated with three double sums over its three indices. The tensor appears 
twice, once with and once without complex conjugation, and is multiplied 
with factors that only depend on the other tensors and the observable and 
are hence constant as we consider the other tensors fixed. We assemble these 
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constants to a tensor O and get 


(Y(T) |0|Y(T)) 


Xl-l X2~l * 3-1 

E E E 


®V a l=0 a' 2 A2=0 tt' 3 A3=0 


T’[' 7 ]* tH 

d' 1 K.' 2 K .' 3 a l a 2 a 3 


o 


A 2 A 3 


(9.5) 


The elements of the tensor O can be calculated explicitly (Keep in mind 
here, that for each active tensor a, we get a different tensor O.): Let us de¬ 
note by T(a-i,a. 2 , the tensor network T with the following alteration: In 
T(a:i, all tensors are the same as in T for b / a, but the tensor 

TH is replaced by a tensor that has zeroes in all entries except for the entry 

T’ljla Z/ a 3 , which is 1. With this definition, it is easy to see, that 


O Oc'^DC^Oc'^DLi^OiS (Y (f | o I Y (f( ai a 2 a3))> ■ 

Hence, we can use the algorithm described in Sec. 9.2 to compute the tensor 
O. (Note that the facts just presented for the product observable O also hold 
for any sum of product observables, such as the system Hamiltonian H.) We 
only have to use the contraction algorithm for each term in H in order to 
obtain a tensor £, that plays, for H, the same role as O does for O in Eq. (9.5). 

We now come back to the task of minimizing the expression (9.4). We 
assemble the index triple («i, 0 C 2 , 0 . 3 ) that occurred in Eq. (9.5) to a single index 


a = ai + Xi<*2 + XiX2&3- 

Indexed in this manner, the three-index tensor TM becomes a one-index ten¬ 
sor T^, i. e., a vector. Likewise, we use the same indexing to regard the 6- 
index tensor O as a matrix, and write henceforth O lX i a for Eq. 

(9.5) then becomes 

X1X2X3-1 , . , . 

(Y(T) | O | Y(T)) = £ T^O^TjP = T [fll+ OT [a] . 

a',a =0 


Next, we introduce the tensor E that corresponds to the Hamiltonian in 
the same way as O corresponds to O and will be read as a matrix. We also 
define a tensor or matrix N that corresponds in the same way to the identity 
"observable" 1. This allows us to rewrite expression (9.4) as 

tW+ftW 

r j (9.6) 

th+nth 

This expression is the quotient of two quadratic forms and such an expression 
is known as generalized Rayleigh quotient. It can be minimized globally with 
respect to the entries in T[ fl l by solving the generalized eigenvalue problem 
for the matrix pencil E — AN, i. e. 
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The generalized eigenvector To corresponding to the smallest generalized 
eigenvalue Ao minimizes the energy and Ao is this minimum energy. Hence, 
by solving the generalized eigenvalue problem and then replacing T[ fl l with 
To (now read again as 3-index tensor), we have found the global minimum of 
the energy as a function of the elements of . At this point, it is important to 
remark that even though we have found a global minimum here, it is merely 
a global minimum for the restricted energy function that only allows the en¬ 
tries in one tensor to be varied. There is a priori no reason to assume that 
performing these steps for each tensor will lead to the true global minimum 
of the optimization problem (9.5). However, numerical experiments show 
that iterating this procedure several times over all tensors gives very good 
results and seems to converge to the true global minimum in most cases, al¬ 
though it can be proved that this does not always hold (by adapting the proof 
of [Eis06]). 

We need to solve the generalized eigenvalue problem (9.7). Algorithms 
for this well-studied problem have been implemented, most notably in all 
implementations of LAPACK [ABB + 99] and its supersets. As both E and N 
are Hermitean, we may use a routine which takes advantage of this such as 
LAPACK's routine ZHEGV, which performs a Cholesky factorization 6 to re¬ 
duce the problem to an ordinary eigenvalue problem, which is then solved by 
Lanczos iteration. (For non-Hermitean matrices, LAPACK offers the routine 
ZGEGV, which implements the QZ algorithm [MS73].) 


9.5.1 Dealing with linear dependencies 

When optimizing the tensor T0 it is helpful to regard the tensor tree state as 
a sum of the form 

Xl Xl A3 , , 

I y(t)) = £ E £ ^ai« 2“3 M1M2M3' (9-8) 

ttl=0 (12=0 (*3=0 

where the ket | a) • denotes the state that lives on the spins that are connected 
with tensor vertex a via its edge j and that results from contraction along 
all the edges connected to the tensor vertex via edge j if the index of this 
edge is fixed to the value a. In this notation, the meaning of the elements 
of the matrices N and E in the generalized Rayleigh coefficient may then be 
understood as 


^X^X^XyX I(l2tt3 

p 

^ Ci' 1 oc' 2 a.' 3 Ai&2&3 


= 30312021 101 I Ml M 2 l a 3); 


If now one of the sets 


= 303 I 2021101 | H 01 )l 1*2)2 M 3 • 

nr 


(/ = 1,2,3) is linearly dependent, the ma¬ 
trix N becomes singular and some of its eigenvalues vanish. Then, N ceases 


oc =0 


6 A good textbook on numerical linear algebra that explains these terms is [TB97] 
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to have a Cholesky factorization, and the Cholesky-Lanczos solver cannot 
be used any longer. We have tried to use the QZ algorithm [MS73] instead, 
which is also implemented in LAPACK. However, it is numerically unsta¬ 
ble and seems to drift ever deeper into the region of singular N matrices. 
This does not seem to be the fault of the QZ algorithm, but rather because 
generalized eigenvalue problems with singular or near-singular matrices are 
inherently highly sensitive to small perturbations [Ste72], 

Hence, we try to "cut off" the singular part. To this end, we first diago¬ 
nalize N, 

N = UDU f , 

where the diagonal matrix D contains the eigenvalues in descending order, 
so that all vanishing eigenvalues are found at the bottom. (There cannot be 
any truly negative eigenvalues, and negative or positive values very close 
to zero shall be deemed numerically zero, and the imprecision only due to 
rounding errors.) Let s be the number of zero eigenvalues, and D' be the 
(£ — s) x (£ — s) diagonal matrix of the non-vanishing eigenvalues (writing 
£ = X\X 2 X 3 f° r the size of N and E). If we cut away the s right-most columns 
of U, yielding the rectangular £ x (£ — s) matrix R, we have 

D' = R f NR. 

We transform E similarly to the (£ — s) x (£ — s) matrix 

E' = R f ER. 

We have lost no information by reducing the size of E. To see this, note 
that the original tensor 01 can as well afford to lose this information. Let 
us rewrite the Xi x Xi x A3 tensor T-"l as a vector T fl of length £ = X 1 X 2 A3/ 
transform it to length £ — s and back to length £ by the operation T„ = RR i T n . 
The state | Y(t)) as written in Eq. (9.8) does not change if T-"I is replaced by T a 
(written as a tensor again), as can be seen from a straight-forward calculation. 

As D' is now manifestly positive definite, the generalized eigenvalue 
problem E'T' = AD'T' can be solved easily. As inversion of D' is trivial (just 
replace the diagonal entries by its reciprocals), we just have to solve an ordi¬ 
nary eigenvalue problem O' E'T' = AT'. The eigenvector Tg corresponding 
to the lowest eigenvalue Ao is now a vector of length £ — s, but can be trans¬ 
formed to length £ using R, 

T 0 = RTg. 

The vector To, written again as rank-3 tensor of size Ai x A 2 x A3 is used to 
overwrite the old tensor T. 

9.6 Combining tensor trees and weighted graphs 

At the end of 9.4, we noticed that we need to enhance the class of tensor tree 
states in order to make them suitable for 2D lattices. Here, we show how 
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observables can still be calculated efficiently if we augment the tensor tree 
states by letting weighted phase gates act on all pairs of spins, because such 
an augmented state is capable of showing the required amount of entangle¬ 
ment (see Sec. 8.3.3). The fact, that such a combination is possible without 
losing the ability to efficiently evaluate observables has first been noticed by 
F. Verstraete and W. Diir [pers. comm.] and independently by J. Eisert and M. 
Plenio [pers. comm.]. The algorithm that I present here is simpler and more 
efficient than the original ansatz of Diir and Verstraete. 

As in Chapter 8, we define the generalized phase gate, acting on two d- 
level systems, as 

d-l 

Wo = |sis 2 > <sis 2 | e z ° s i s 2 , (9.9) 

s l, s 2=0 


where the d x d matrix <t> contains the phases. We shall require, as in Ch. 8, 
that <f> = <f> T and that <3>o /S = <3> S/ o = 0 (s = 0,... ,d — 1). As before, <3> is a 
four-index tensor <£>(0 where the lower two indices specify the levels Si,S 2 
(cf. Eq. (9.9)) and the upper two the spin indices a, b E {1,..., N}. With this, 
we define the operator Wo, which applies all phase gates. 


w*=n<t ] - 

a,b 

a<b 


We consider states that result from applying Wo onto a tensor tree state 
|Y(T)) with tensor network T. We shall see that for a state Wo V(T)) it is 
possible to efficiently calculate the expectation value of any observable that 
can be written as sum of K terms, each of which is a tensor product of local 
operators of which only a small number k is not the identity. The time to 
calculate such an expectation value scales obviously linearly in K and, as we 
shall see, exponential in k. 

Often, the observable of interest is the Hamiltonian H, and for many sys¬ 
tems H is a sum of bi-local terms, 

»= E 

( a,b)eB 

where B is the set of bonds. Bonds are pairs of qubits acted on jointly by a 
term in H. We restrict ourselves to this case to keep notation simple. The 
energy is 


E= E (Y(T)|W^fWo|Y(T))= £ tr H ab p ab , 

(a,b)eB (a,b)eB 
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and we now discuss how to calculate the reduced density matrices 

Pab = tr nw , } W*|Y(T)) <Y(T)| W£ 

= (tr n{fl/fc} W ah |Y(T)> (Y(T)| W„ + b ) W^. 

'-V-' 

• = Pab 

{V := {1,...,N}) (9.10) 

Here, we have pulled W^ b out of the trace, and also taken advantage of the 
cyclicity of the trace, which allows us to omit from Wo all those terms which 
do not act on a or b. We denote the operators remaining in the trace with 

V0:= n 

cEV\{a,b} 


Finally, we have to calculate p a b, the underbraced part of Eq. (9.10). To this 
end, we introduce the local diagonal operator 

v <t> ■= X] I s ) ( S 0° s 

s=0 


and observe that 


d—1 


w 0 = El*) 

s=0 


(9.11) 


where 0. /S denotes the s-th column (starting to count from 0) of the matrix O. 

For the next step, note that the matrix elements (s' a s' b | p ah | s a s b ) of p a i, can 
trivially be written as follows: 


I Pab 


Sa^b^) 

= 0*00 | 


(l s «> <4|) ( ' 7) (N) (41) 


,(&) 


Wlb 


|Y(T)) (9.12) 


If the expression in square brackets were a product observable we could cal¬ 
culate the matrix element using the contraction algorithm of Sec. 9.2 on the 
tensor tree state |Y(T)). The trick is now to note that despite its appearance, 
the expression in square brackets in fact is a tensor product of local observ¬ 
ables. To see this, we expand W fl /, using Eq. (9.11), 


n 

E (l»«) <s«l) w v£> 

E (i s »> foi) (i) 

cEV\{a r b} 

~i°a 

Ls d =0 j 

_*»=0 ■' S ”. 


and insert the expansion into Eq. (9.12). After cancelling all but one of the 
many operators (|s fl ) (s fl |)^ and (|s&) (sj,|)^, we use the "linearity" of Vo 
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(i. e. V<e>, = V$ 1 + <i) 2 and = Wo), and are left with 


| Pab | SijSfo) 

= <Y(T)| 


(KX»«l) ( "(K> <s»l) w x 


(b) 


n y < 


ceV\{a,b} 


(c) 

-<J> flc . +0^ -o k , 

-/Sfl -,s'. 


|Y(T)), (9.13) 


which is manifestly a product observable. 

Note that this form enables us to use the optimization scheme of Sec. 9.5 
in order to update the tensors T to yield a state Wo Y (T)) with minimal 
energy. When using this scheme, the phases <t> stay fixed while the tensors in 
T are optimized. In order to optimize the phases, we may keep the tensors 
fixed and use a gradient-based minimization as in Sec. 8.6. 


9.6.1 Implementation 

When carrying out a variational computation with tensor tree states, the 
dominant computational effort is the calculation of the matrices E and N in 
the Rayleigh quotient (9.6). For each element, a contraction of the kind de¬ 
scribed in Sec. 9.2 has to be carried out, and hence this operation has to be 
highly optimized. I have programmed the contraction in C++ and then used 
SWIG [B + ] to generate bindings that allow to use the C++ classes represent¬ 
ing the tensor and leaf vertices from Python. This approach follows again 
the strategy outlined in Sec. 8.A.2 where we coded only the inner loops in 
a high-performance language such as C++ and use a modern dynamically- 
typed language for the rest, which allows for more rapid development for 
the rest. For the latter we have again chosen Python. The fixed interface be¬ 
tween the C++ and the Python part allowed all researchers involved in the 
tree tensor project to work on the Python parts without any need to learn 
about the internals of the C++ core; in fact, without any need to learn the 
C++ programming language at all. 

The strategy turned out to be fruitful, and we have produced code to test 
a wide variety of settings: Tensor-tree states without and with phase gates, 
and, for comparison with DMRG, matrix-product states in tensor-tree form 
(as in Fig. 9.2), also routines to describe different lattices in one and two di¬ 
mensions and different Hamiltonians, including the XY, XXZ, Bose-Hubbard, 
and fermionic Hubbard models. Using the University's high-performance 
compute clusters, we are, at the time of this writing, testing the performance 
of the scheme for various choices of X’ various forms of the tree and various 
symmetry restrictions of the phases. 

In order to get descent computing times, we had to carefully think about 
the control flow in the computation: For instance, it turned out to be abso- 
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lutely crucial to cache the results for sub-trees when going from one term of 
the Hamiltonian to the next in the calculation of E, and to even retain in¬ 
termediate results when going from one active tensor to the next. For the 
optimization of the phases, a gradient has to be obtained (similar to Sec. 8.B) 
in order to use a gradient-based minimizer such as L-BFGS [BLN95]. Even 
then, the computational effort is still too large unless one attempts further 
simplifications such as to only approximate the tree contraction during the 
phase optimization rounds. 

9.6.2 Preliminary results and conclusions 

At the time of this writing, we are still running calculations in order to see 
for which systems the tensor trees, possibly with the weighted-phase gate 
extension, are viable and for which not. Unfortunately, first results are not 
too encouraging. We have put much effort in finding optimal algorithms, es¬ 
pecially with respect to caching (i. e., storage of intermediate results for later 
re-use) which brought down the time complexity considerably. We also used 
advanced programming techniques to make sure that the code makes opti¬ 
mal use of the machine's capabilities. Nevertheless, convergence is clearly 
too slow and too weak. 

Our efforts should be seen in the context of their competition with other 
techniques. As we have seen in the overview of Ch. 6, there is a large variety 
of techniques for treating spin systems to chose from. For one-dimensional 
systems, DMRG is clearly the best choice to find ground states. It converges 
quite fast and reaches an accuracy that leaves little to be desired, and thus 
there is hardly any room left for further improvements. This was the reason 
why we intended from the beginning to focus on two-dimensional systems. 
Our attempts to use weighted-graph states for this, as described in Chs. 7 and 
8 , have been of rather limited success: The system sizes that we could handle 
were very modest, and the accuracy of the approximate state was insufficient 
to see global properties such as momentum distributions. As we have seen in 
Sec. 8.7 this is due to the highly non-linear dependence of observable expecta¬ 
tion values on the parameters that lead, as we have found, to a huge amount 
of local minima. A generic optimization algorithm cannot be expected to find 
a sufficiently good minima among these many possibilities. 

DMRG variants might be expected to fare better in two dimensions. Since 
the first publication [VC04a], the authors of the PEPS method have improved 
their implementation and recently presented results for a Bose-Hubbard sys¬ 
tem [MVC07], which are, however, still limited to 11 x 11 sites. This cannot 
match with path-integral Monte Carlo in terms of system size and accuracy. 
(Ref. [WATB04] treats 50 x 50 sites and does not need to resort to the hard¬ 
core approximation.) On the other hand, the PEPS ansatz shows significant 
promise for time evolution (as demonstrated by an example calculation in 
[MVC07]) and might there be ahead of quantum Monte Carlo. According to 
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the authors of the PEPS papers [I. Cirac, pers. comm.], the main obstacles for 
a more profitable use of the methods are (i) the inability to cope with periodic 
boundary conditions (which severely limits the possibility to extrapolate to 
the thermodynamic limit) and (ii) the adverse scaling with the dimension of 
the spins. Hence our hope that tensor trees might overcome these problems, 
because they have a benign scaling with spin dimension (see Sec. 9.2). Of 
course, tensor tree states cannot be expected to cope well with the entangle¬ 
ment in two dimensions due to the area law (see Sec. 9.4), while the entan¬ 
glement properties of weighted graph states seems well suited for this (see 
Sec. 8.3.3), no matter whether boundary conditions are open or periodic. The 
ansatz to combine these two techniques (see Sec. 9.6) seemed thus a natural 
choice. 

In this combination, we depend crucially on the phase gates to provide 
the necessary entanglement, and it does not come as a complete surprise that 
we experience problems similar to those described in Ch. 8. When optimiz¬ 
ing the phases the convergence is very slow and weak. Often, the energy 
would hardly change for several hour of calculation time only to then sud¬ 
denly and unexpectedly start to drop again, as if the minimization had to tra¬ 
verse a long, shallow "trough" of the potential landscape to reach a "slide" 
at its end. This not only costs time —we need several days for just a 6 x 6 
system of Ising spins—, it also keeps the state at rather high energies. We had 
the hope that alternating the tensor and the phase optimization may allow to 
use the known good convergence of the former to boost the latter and push 
it out of shallow troughs and local minima. While it is not entirely implausi¬ 
ble to expect this, our calculations so far do not show such synergy. Rather, 
the results we got so far are significantly behind any of the other mentioned 
techniques in terms of precision as well as speed. 

As we are still running calculations and have not yet entirely run out of 
possibilities to tweak the code or improve the algorithms it is to early to give 
a final verdict. Nevertheless, a claim of success can definitely not be made 
yet. It is likely that the conclusion of Sec. 8.7 will have to be upheld, namely 
that a variational method based on phase gates needs an optimization algo¬ 
rithm specifically tailored to this problem. This can only be achieved if deep 
insights are gained in how the potential landscape is shaped by the specific 
parametrization of the manifold of our class of variational states. 
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Appendix A 

Software used 


A large variety of software was used in the work leading to this thesis. Many 
of these programs are open source software, to a good part created in the 
course of academic research. Hence is is proper to list and acknowledge at 
least the most important of these programs. Giving a full list is probably 
nearly impossible due to the vast amount of intertwined programs in modern 
computer systems. 

All work was done using the GNU/Linux operating system, making ex¬ 
tensive use of the GNU Tools, especially the Bourne Again Shell (bash) and 
the other GNU Core Utils (including the GNU Text Utils). (See [FSF] for more 
information on the GNU Project). NEdit was used as text editor. 

The Mozilla suite and its derivatives was crucial for information access. 

Programming was done with the GNU Compiler Collection (GCC, 
[GCC]), most importantly with the GNU C++ compiler, but also with Intel's 
C++ compiler. Perl and especially Python [R ] were used extensively, as well 
as SWIG (the "simple wrapper and interface generator" [B + ]). The GNU 
Debugger (gdb) was used for debugging, along with Python's debugging 
facilities. For symbolic calculations. Wolfram's Mathematica was employed. 

Several numerics libraries were used, including the GNU Scientific Li¬ 
brary (GSL, [GTJ 1 03]), the NumPy system [Num], and the Intel Math Kernel 
Library (MKL) as highly optimized implementation of BLAS [FDD 1 02] and 
LAPACK [ABB + 99], next to ATLAS [WPD01] as alternative implementation. 

For presentation of the results, the TpX / LTpX distribution teTeX was relied 
on, and for plots and illustrations, Gnuplot, Grace, and OpenOffice.org were 
used. 

For completeness of the biography, I would also like to mention those 
computing references and manuals that I relied on most heavily: For Python, 
the books [LA04] and especially [Mar03], for C++ the classic textbook [Str97] 
and the standard library reference [Jos99], for DTgX the works [MG04, Gra97, 
Pak02], 
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